SlideShare a Scribd company logo
1 of 7
Download to read offline
Receding Horizon Stochastic Control Algorithms for Sensor Management
Darin Hitchings and David A. Casta˜n´on
Abstract— The increasing use of smart sensors that can
dynamically adapt their observations has created a need for
algorithms to control the information acquisition process. While
such problems can usually be formulated as stochastic control
problems, the resulting optimization problems are complex and
difficult to solve in real-time applications. In this paper, we
consider sensor management problems for sensors that are
trying to find and classify objects. We propose alternative
approaches for sensor management based on receding horizon
control using a stochastic control approximation to the sensor
management problem. This approximation can be solved using
combinations of linear programming and stochastic control
techniques for partially observed Markov decision problems
in a hierarchical manner. We explore the performance of our
proposed receding horizon algorithms in simulations using
heterogeneous sensors, and show that their performance is
close to that of a theoretical lower bound. Our results also
suggest that a modest horizon is sufficient to achieve near-
optimal performance.
I. INTRODUCTION
Recent advances in embedded computing have introduced
a new generation of sensors that have the capability of
adapting their sensing dynamically in response to collected
information. For instance, unmanned aerial vehicles (UAVs)
have multiple sensors —radar and electro-optical cameras
—which can dynamically change their fields of view and
measurement modes. These advances have created a need
for a commensurate theory of sensor management (SM)
and control to ensure that relevant information is collected
for the mission of the sensor system given the available
sensor resources. There are numerous applications involving
surveillance, diagnosis and fault identification that require
such control.
One of the earliest examples of SM arose in the context
of Search, with applications to anti-submarine warfare [1].
Sensors had the ability to move spatially and allocate their
search effort over time and space. Most of the early work
on search theory focused on open-loop search plans rather
than feedback control of search trajectories [2]. Extensions
of search theory to problems requiring adaptive feedback
strategies have been developed in some restricted contexts
[3].
Adaptive SM has its roots in the field of statistics, where
Bayesian experiment design was used to configure subse-
quent experiments based on observed information. Wald [4],
[5] considered sequential hypothesis testing with costly ob-
servations. Lindley [6] and Kiefer [7] expanded the concepts
to include variations in potential measurements. Chernoff [8]
This work was supported by a grant from AFOSR.
The authors are with the Dept of Electrical & Computer Eng., Boston
University, dhitchin@bu.edu, dac@bu.edu
and Fedorov [9] used Cramer-Rao bounds for selecting se-
quences of measurements for nonlinear regression problems.
Most of the strategies proposed for Bayesian experiment
design involve single-step optimization criteria, resulting
in greedy or myopic strategies that optimize bounds on
the expected performance after the next experiment. Other
approaches to adaptive SM using single-stage optimization
have been proposed using alternative information theoretic
measures [10], [11].
Feedback control approaches to SM that consider opti-
mization over time have also been explored. Athans [12]
considered a two-point boundary value approach to control-
ling the error covariance in linear estimators by choosing
the measurement matrices. Multi-armed bandit formulations
have been used to control individual sensors in applications
related to target tracking [13], [14]. Such approaches are
restricted to single-sensor control, selecting among individual
subproblems to measure, in order to obtain solutions using
Gittins indices [15], [16]. Approximate dynamic program-
ming (DP) techniques have also been proposed using ap-
proximations to the optimal cost-to-go based on information
theoretic measures evaluated using Monte Carlo techniques
[17], [18]. A good overview of these techniques is available
in [19].
The above approaches for dynamic feedback control are
limited in application to problems with a small number
of sensor-action choices and simple constraints because the
algorithms must enumerate and evaluate the various control
actions. In [20], combinatorial optimization techniques are
integrated into a DP formulation to obtain approximate
stochastic dynamic programming (SDP) algorithms that ex-
tend to large numbers of sensor actions. Subsequent work
in [21] derived an SDP formulation using partially ob-
served Markov decision processes (POMDPs) and obtained
a computable lower bound to the achievable performance of
feedback strategies for complex multi-sensor management
problems. The lower bound was obtained by a convex
relaxation of the original combinatorial POMDP using mixed
strategies and averaged constraints. However, the results in
[21] do not specify algorithms with performance close to the
lower bound.
In this paper, we develop and implement algorithms for the
efficient computation of adaptive SM strategies for complex
problems involving multiple sensors with different observa-
tion modes and large numbers of objects. The algorithms
are based on using the lower bound formulation from [21]
as an objective in a receding horizon (RH) optimization
problem and developing techniques for obtaining feasible
decisions from the mixed strategy solutions. The resulting
algorithms are scalable to large numbers of tasks, and
suitable for real-time SM. We also extend the model of [21]
to incorporate search actions in addition to classification.
We evaluate alternative approaches for obtaining feasible
decision strategies, and evaluate the resulting performance of
the RH algorithms using multi-sensor simulations. Our sim-
ulation results demonstrate that our RH algorithms achieve
performance comparable to the predicted lower bound of [21]
and shed insight into the relative value of different strategies
for partitioning sensor resources either geographically or by
sensor specialization.
The rest of this paper is organized as follows: Section II
describes the formulation of the stochastic SM problem.
Section III provides an example of the column generation
technique for generating mixed strategies for SM. Section IV
discusses how we create feasible, sequenced sensor schedules
from these mixed strategies. Section V documents our sim-
ulation results for various scenarios. Section VI summarizes
our results and discusses areas for future work.
II. PROBLEM FORMULATION AND BACKGROUND
The problem formulation is an extension of the POMDP
formulation presented in [21]. Assume that there are a finite
number of locations 1, . . . , N, each of which may have an
object with a given type, or which may be empty. Assume
that there is a set of S sensors, each of which has multiple
sensor modes, and that each sensor can observe one and only
one location at each time with a selected mode.
Let xi ∈ {0, 1, . . . , D} denote the state of location i,
where xi = 0 if location i is unoccupied, and otherwise
xi = k > 0 indicates location i has an object of type
k. Let πi(0) ∈ D+1
be a discrete probability distribution
over the possible states for the ith
location for i = 1, . . . , N
where D ≥ 2. Assume additionally that the random variables
xi, i = 1, . . . , N are mutually independent.
There are s = 1, . . . , S sensors, each of which has m =
1, . . . , Ms possible modes of observation. We assume there
is a series of T discrete decision stages where sensors can
select which location to measure, where T is large enough
so that all of the sensors can use their available resources.
At each stage, each sensor can choose to employ one and
only of its modes on a single location to collect a noisy
measurement concerning the state xi at that location. Each
sensor s has a limited set of locations that it can observe,
denoted by Os ⊆ {1, . . . , N}. A sensor action by sensor s
at stage t is a pair:
us(t) = (is(t), ms(t)) (1)
consisting of a location to observe, is ∈ Os, and a mode for
that observation, ms.
Sensor measurements are modeled as belonging to a finite
set y ∈ {1, . . . , Ls}. The likelihood of the measured value
is assumed to depend on the sensor s, sensor mode m,
location i and on the true state at the location xi but not
on the states of other locations. Denote this likelihood as
P(y|xi, i, s, m). We assume that this likelihood is time-
invariant, and that the random measurements yi,s,m(t) are
conditionally independent of other measurements yj,σ,n(τ)
given the location states xi, xj for all sensor modes m, n
provided i = j or τ = t.
Each sensor has a limited quantity of Ri resources avail-
able for measurements. Associated with the use of mode m
by sensor s on location i is a resource cost rs(us(t)) to use
this mode, representing power or some other type of resource
required to use this mode from this sensor.
T −1
t=0
rs(us(t)) ≤ Rs ∀ s ∈ [1, . . . , S] (2)
This is a hard constraint for each realization of observations
and decisions.
Let I(t) denote the sequence of past sensing actions and
measurement outcomes up to and including stage t − 1:
I(t) = {(us(k), ys(k)), s = 1, . . . , S; k = 0, . . . , t − 1}
Under the assumption of conditional independence of mea-
surements and independence of individual states at each lo-
cation, the conditional probability of (x1, . . . , xN ) given I(t)
can be factored as a product of belief states at each location.
Denote the belief state at location i as πi(t) = p(xi|I(t)).
When a sensor measurement is taken, the belief state is up-
dated according to Bayes’ Rule. A measurement of location i
with the sensor-mode combination us(t) = (i, m) at stage t
that generates observable y(t) updates the belief vector as:
πi(t + 1) =
diag{P(y(t)|xi = j, i, s, m)}πi(t)
1T
diag{P(y(t)|xi = j, i, s, m)}πi(t)
(3)
where 1 is the D + 1 dimensional vector of all ones. Eq. (3)
captures the relevant information dynamics that SM controls.
In addition to information dynamics, there are resource
dynamics that characterize the available resources at stage t.
The dynamics for sensor s are given as:
Rs(t + 1) = Rs(t) − rs(us(t)); Rs(0) = Rs (4)
These dynamics constrain the admissible decisions by a
sensor, in that it can only use modes that do not use more
resources than are available.
Given the final information I(T), the quality of the
information collected is measured by making an estimate of
the state of each location i given the available information.
Denote these estimates as vi i = 1, . . . , N. The Bayes’ cost
of selecting estimate vi when the true state is xi is denoted
as c(xi, vi) ∈ with c(xi, vi) ≥ 0. The objective of the SM,
stochastic control formulation is to minimize:
J =
N
i=1
E[c(xi, vi)] (5)
by selecting adaptive sensor control policies and final esti-
mates subject to the dynamics of Eq. (3) and the constraints
of Eq. (4) and Eq. (2).
The results in [21] provide an SDP algorithm to solve
the above problem, with cost-to-go at stage t depending
on the joint belief state π(t) = [π1(t), . . . , πN (t)] and the
residual resource state R(t) = [R1(t), . . . , RS(t)]. Because
of this dependency, the cost-to-go does not decouple over
locations. This leads to a very large POMDP problem with
combinatorially many actions and an underlying belief state
of dimension (D + 1)N
that is computationally intractable
unless there are very few locations.
In [21], the above stochastic control problem was replaced
with a simpler problem that provided a lower bound on the
optimal cost, by expanding the set of admissible strategies,
replacing the constraints of Eq. (2) by the “soft” constraints:
E[
T −1
t=0
rs(us(t))] ≤ Rs ∀ s ∈ [1 . . . S] (6)
To solve the simpler problem, [21] proposed incorporation of
the soft constraints in Eq. (6) into the objective function using
Lagrange multipliers λs for each sensor s. The augmented
objective function is:
¯Jλ = J +
T −1
t=0
S
s=1
λs E[rs(us(t))] −
S
s=1
λsRs (7)
A key result in [21] was that when the optimization of Eq. (7)
was done over mixed strategies for given values of Lagrange
multipliers λs, the stochastic control problem decoupled into
independent POMDPs for each location, and the optimiza-
tion could be performed using feedback strategies for each
location i that depended only on the information collected
for that location, Ii(t). These POMDPs have an underlying
information state-space of dimension D + 1, corresponding
to the number of possible states at a single location, and
can be solved efficiently. Because the measurements and
possible sensor actions are finite-valued, the set of possible
SM strategies Γ is also finite. Let Q(Γ) denote the set of
mixed strategies that assign probability q(γ) to the choice of
strategy γ ∈ Γ. The problem of finding the optimal mixed
strategies can be written as:
min
q∈Q(Γ)
γ∈Γ
q(γ) E
γ
J(γ) (8)
γ∈Γ
q(γ) E
γ
[
N
i=1
T −1
t=0
rs(us(t))] ≤ Rs s ∈ [1, . . . , S] (9)
γ∈Γ
q(γ) = 1 (10)
where we have one constraint for each of the S sensor re-
source pools and an additional simplex constraint in Eq. (10)
which ensures that q ∈ Q(Γ) forms a valid probability
distribution. This is a large linear program (LP), where the
number of possible variables are the strategies in Γ. However,
the total number of constraints is S+1, which establishes that
optimal solutions of this LP are mixtures of no more than
S + 1 strategies. Thus, one can use a column generation
approach [22], [23], [24] to quickly identify an optimal
mixed strategy. In this approach, one solves Eq. (8) and
Eq. (9) restricting the mixed strategies to be mixtures of
a small subset Γ ⊂ Γ. The solution of the restricted LP
has optimal dual prices λs, s = 1, . . . , S. Using these prices,
one can determine a corresponding optimal pure strategy by
minimizing:
Jλ =
N
i=1
E[c(xi, vi)] +
T −1
t=0
S
s=1
λs E[rs(us(t))] −
S
s=1
λsRs
(11)
which the results in [21] show can be decoupled into
N independent optimization problems, one for each location.
Each of these problems is solved as a POMDP using standard
algorithms such as point-based value iteration (PBVI) [25]
to determine the best pure strategy γ1 for these prices. If the
best pure strategy γ1 is already in the set Γ , then the solution
of Eq. (8) and Eq. (9) restricted to Q(Γ ) is an optimal mixed
strategy over all of Q(Γ). Otherwise, the strategy γ1 is added
to the admissible set Γ , and the iteration is repeated. The
result is a set of mixed strategies that achieve a performance
level that is a lower bound on the original SM optimization
problem with hard constraints.
III. COLUMN GENERATION AND POMDP SUBPROBLEM
EXAMPLE
We present an example to illustrate the column generation
algorithm and POMDP algorithms discussed previously. In
this simple example we consider 100 objects (N=100), 2 pos-
sible object types (D=2) with X = {non-military vehicle,
military vehicle}, and 2 sensors that each have one mode
(S = 2 and Ms = 1 ∀ s ∈ {1, 2}). Sensor s actions have
resource costs: rs, where r1 = 1, r2 = 2. Sensors return
2 possible observation values, corresponding to binary object
classifications, with likelihoods:
P(yi,1,1(t)|xi, u1(t)) P(yi,2,1(t)|xi, u2(t))
0.90 0.10
0.10 0.90
0.92 0.08
0.08 0.92
where the (j, k) matrix entry denotes the likelihood that y =
j if xi = k. The second sensor has 2% better performance
than the first sensor but requires twice as many resources to
use. Each sensor has Rs = 100 units of resources, and can
view each location. Each of the 100 locations has a uniform
prior of πi = [0.5 0.5]T
∀ i. For the performance objective,
we use c(xi, vi) = 1 if xi = vi, and 0 otherwise, where the
cost is 1 unit for a classification error.
Table I demonstrates the column generation solution pro-
cess. The first three columns are initialized by guessing val-
ues of resource prices and obtaining the POMDP solutions,
yielding expected costs and expected resource use for each
sensor at those resource prices. A small LP is solved to obtain
the optimal mixture of the first three strategies γ1, . . . , γ3,
and a corresponding set of dual prices. These dual prices are
used in the POMDP solver to generate the fourth column
γ4, which yields a strategy that is different from that of the
first 3 columns. The LP is re-solved for mixtures of the first
4 strategies, yielding new resource prices that are used to
generate the next column. This process continues until the
solution using the prices after 7 columns yields a strategy that
was already represented in a previous column, terminating
γ1 γ2 γ3 γ4 γ5 γ6 γ7
min 50.0 2.80 2.44 1.818 8 10 6.22
R1 0 218 200 0 0 100 150 ≤ 100
R2 0 0 36 800 200 0 18 ≤ 100
Simplex 1 1 1 1 1 1 1 = 1
Optimal
cost - - 26.22 21.28 7.35 5.95 5.95
Mixture
weights 0 0.424 0 0 0.500 0.076 0
λc
1 1.0e15 0.024 0.010 0.238 0.227 0.217 0.061
λc
2 1.0e15 0.025 0.015 0 0.060 0.210 0.041
TABLE I: Column generation example with 100 objects. The
tableau is displayed in its final form after convergence. λc
s describe
the lambda trajectories up until convergence. R1 and R2 are
resource constraints. γ1 is a ‘do-nothing’ strategy. Bold numbers
represent useful solution data.
Fig. 1: The 3 policy graphs that correspond to columns 2, 5 and
6 of Table I. The frequency of choosing each of these 3 strategies
is controlled by the relative proportion of the mixture weight qc ∈
(0..1) with c ∈ {2, 5, 6}.
the algorithm. The optimal mixture combines the strategies
of the second, fifth and sixth columns. When the master prob-
lem converges, the optimal cost, J∗
, for the mixed strategy
is 5.95 units. The resulting policy graphs are illustrated in
Fig. 1, where branches up indicate measurements y = 1
(‘non-military’) and down y = 2 (‘military’). The red and
green nodes denote the final decision, vi, for a location.
Note that the strategy of column 5 uses only the second
sensor, whereas the strategies of columns 2 and 6 use only
the first sensor. The mixed strategy allows the soft resource
constraints to be satisfied with equality. Table I also shows
the resource costs and expected classification performance
of each column.
The example illustrates some of the issues associated with
the use of soft constraints in the optimization: the resulting
solution does not lead to SM strategies that will always
satisfy the hard constraints Eq. (2). We address this issue
in the subsequent section.
IV. RECEDING HORIZON CONTROL
The column generation algorithm described previously
solves the approximate SM problem with “soft” constraints
in terms of mixed strategies that, on average, satisfy the
resource constraints. However, for control purposes, one
must select actual SM actions that satisfy the hard constraints
Eq. (2). Another issue is that the solutions of the decoupled
POMDPs provide individual sensor schedules for each lo-
cation that must be interleaved into a single coherent sensor
schedule. Furthermore, exact solution of the small decoupled
POMDPs for each set of prices can be time consuming,
making the resulting algorithm unsuited for real-time SM.
To address this, we will explore a set of RH algorithms
that will convert the mixed strategy solutions discussed in the
previous section to actions that satisfy the hard constraints,
and limit the computational complexity of the resulting
algorithm. The RH algorithms have adjustable parameters
whose effects we will explore in simulation.
The RH algorithms start at stage t with an information
state/resource state pair, consisting of available information
about each location i = 1, . . . , N represented by the condi-
tional probability vector πi(t) and available sensor resources
Rs(t), s = 1, . . . , S. The first step in the algorithms is
to solve the SM problem of Eq. (5) starting at stage t to
final stage T subject to soft constraints Eq. (6), using the
hierarchical column generation / POMDP algorithms to get
a set of mixed strategies. We introduce a parameter corre-
sponding to the maximum number of sensing actions per
location to control the resulting computational complexity
of the POMDP algorithms.
The second step is to select sensing actions to implement
at the current stage t from the mixed strategies. These
strategies are mixtures of at most S + 1 pure strategies,
with associated probabilistic weights. We explore three ap-
proaches for selecting sensing actions:
• str1: Select the pure strategy with maximum probability.
• str2: Randomly select a pure strategy per location
according to the optimal mixture probabilities.
• str3: Select the pure strategy with positive probability
that minimizes the expected sensor resource use (and
thus leaves resources for use in future stages.)
Once pure strategies for each location have been selected,
the third step is to select a sensing action to be implemented
for each location. Our approach is to select the first sensing
action of the pure strategy for each location. Note that
there may not be enough sensor resources to execute the
selected actions, particularly in the case where the pure
strategy with maximum probability is selected. To address
this, we rank sensing actions by their expected entropy
gain [26], which is the expected reduction in entropy of
the conditional probability distribution, πi(t), based on the
anticipated measurement value. We schedule sensor actions
in order of decreasing expected entropy gain, and perform
those actions at stage t that have enough sensor resources to
be feasible.
The measurements collected from the scheduled actions
are used to update the information states πi(t + 1) using
Eq. (3). The resources used by the actions are eliminated
from the available resources to compute Rs(t + 1) using
Eq. (4). The RH algorithm is then executed from the new
information state/resource state condition.
Search Low-res Hi-res
y1 y2 y3 y1 y2 y3 y1 y2 y3
empty 0.92 0.04 0.04 0.95 0.03 0.02 0.95 0.03 0.02
car 0.08 0.46 0.46 0.05 0.85 0.10 0.02 0.95 0.03
truck 0.08 0.46 0.46 0.05 0.10 0.85 0.02 0.90 0.08
military 0.08 0.46 0.46 0.05 0.10 0.85 0.02 0.03 0.95
TABLE II: Observation likelihoods for different sensor
modes with the observation symbols y1, y2 and y3.
Fig. 2: Illustration of scenario with two partially-overlapping
sensors.
V. SIMULATION RESULTS
In order to evaluate the relative performance of the differ-
ent RH algorithms, we performed a set of experiments with
simulations. In these experiments, there were 100 locations,
each of which could be empty, or have objects of three types,
so the possible states of location i were xi ∈ {0, 1, 2, 3}
where type 1 represents cars, type 2 trucks, and type 3
military vehicles. Sensors can have several modes: a search
mode, a low resolution mode and a high resolution mode.
The search mode primarily detects the presence of objects;
the low resolution mode can identify cars, but confuses
the other two types, whereas the high resolution mode can
separate the three types. Observations are modeled as having
three possible values. The search mode consumes 0.25 units
of resources, whereas the low-resolution mode consumes
1 unit and the high resolution mode 5 units, uniformly
for each sensor and location. Table II shows the likelihood
functions that were used in the simulations.
Initially, each location has a state with one of two prior
probability distributions: πi(0) = [0.10 0.60 0.20 0.10]T
,
i ∈ [1, . . . , 10] or πi(0) = [0.80 0.12 0.06 0.02]T
, i ∈
[11, . . . , 100]. Thus, the first 10 locations are likely to contain
objects, whereas the other 90 locations are likely to be empty.
When multiple sensors are present, they may share some
locations in common, and have locations that can only be
seen by a specific sensor, as illustrated in Fig. 2.
The cost function used in the experiments, c(xi, vi) is
shown in Table III. The parameter MD represents the cost
of a missed detection, and will be varied in the experiments.
Table IV shows simulation results for a search and clas-
sify scenario involving 2 identical sensors (with the same
xi; vi empty car truck military
empty 0 1 1 1
car 1 0 0 1
truck 1 0 0 1
military MD MD MD 0
TABLE III: Decision costs
MD = 1 MD = 5 MD = 10
Hor. 3 str1 str2 str3 str1 str2 str3 str1 str2 str3
Res 30 3.64 3.85 3.85 11.82 12.88 12.23 15.28 14.57 14.50
Res 50 2.40 2.80 2.43 6.97 6.93 7.84 10.98 9.99 10.45
Res 70 2.45 2.32 1.88 3.44 3.99 4.04 6.14 6.48 5.10
Hor. 4
Res 30 3.58 3.46 3.52 12.28 12.62 11.90 14.48 15.91 15.59
Res 50 2.37 2.21 2.33 7.44 7.44 7.20 9.94 9.28 10.65
Res 70 1.68 1.33 1.60 3.59 3.57 3.62 6.30 5.18 5.86
Hor. 6
Res 30 3.51 3.44 3.73 11.17 11.85 12.09 15.17 14.99 13.6
Res 50 2.28 2.11 2.31 7.29 8.02 7.70 10.67 10.47 11.25
Res 70 1.43 1.38 1.44 3.60 3.73 3.84 4.91 5.09 5.94
Bounds
Res 30 3.35 11.50 13.85
Res 50 2.21 6.27 9.40
Res 70 1.32 2.95 4.96
TABLE IV: Simulation results for 2 homogeneous, multi-modal
sensors. str1: select the most likely pure strategy for all locations;
str2: randomize the choice of strategy per location according to
mixture probabilities; str3: select the strategy that yields the least
expected use of resources for all locations.
visibility), evaluating different versions of the RH control
algorithms and with different resource levels. The variable
“Horizon” is the total number of sensor actions allowed per
location plus one additional action for estimating the location
content. The table shows results for different resource levels
per sensor, from 30 to 70 units, and displays the lower bound
performance computed. The missed detection cost MD is
varied from 1 to 10. The results shown in Table IV represent
the average of 100 Monte Carlo simulation runs of the RH
algorithms.
The results show that using a longer horizon in the
planning improves performance minimally, so that using
a RH replanning approach with a short horizon can be
used to reduce computation time with limited performance
degradation. The results also show that the different RH
algorithms have performance close to the optimal lower
bound in most cases, with the exception being the case of
MD = 5 with 70 units of sensing resources per sensor. For
a horizon 6 plan, the longest horizon studied, the simulation
performance is close to that of the associated bound. In
terms of which strategy is preferable for converting the
mixed strategies, the results of Table IV are unclear. For
short planning horizons in the RH algorithms, the preferred
strategy appears to be to use the least resources (str3), thus
allowing for improvement from replanning. For the longer
horizons, there was no significant difference in performance
among the three strategies. To illustrate the computational
requirements of this scenario (4 states, 3 observations, 2 sen-
sors (6 actions), full sensor-overlap), the number of columns
generated by the column generation algorithm to compute a
set of mixed strategies was on the order of 10-20 columns
for the horizon 6 algorithms, which takes about 60 sec on
a 2.2 Ghz, single-core, Intel P4 machine under linux using
C code in ‘Debug’ mode (with 1000 belief-points for PBVI).
Memory usage without optimizations is around 3 MB.
There are typically 4-5 planning sessions in a simulation.
Profiling indicates that roughly 80% of the computing time
Homogeneous Heterogeneous
MD MD
Horizon 3 1 5 10 1 5 10
Res[150, 150] 5.69 16.93 30.38 6.34 18.15 31.23
Res[250, 250] 4.61 16.11 25.92 5.53 16.77 29.32
Res[350, 350] 4.23 15.30 21.45 5.12 16.41 27.41
Horizon 4
Res[150, 150] 5.02 16.06 20.61 5.64 16.85 20.61
Res[250, 250] 3.94 9.46 12.66 4.58 12.05 14.87
Res[350, 350] 3.35 8.58 12.47 4.28 9.41 12.65
Horizon 6
Res[150, 150] 4.62 15.66 19.56 5.27 16.20 19.56
Res[250, 250] 2.92 8.24 10.91 3.32 8.83 11.35
Res[350, 350] 2.18 4.86 7.15 2.66 6.63 9.17
TABLE V: Comparison of lower-bounds for 2 homogeneous, bi-
modal sensors vs. 2 heterogeneous sensors.
goes towards value backups in the PBVI routine and 15%
goes towards (recursively) tracing decision-trees in order to
back out (deduce) the measurement costs from hyperplane
costs. (Every node in a decision-tree / policy-graph (for
each pure strategy) has a corresponding hyperplane with
a vector of cost coefficients that represent classification +
measurement costs).
In the next set of experiments, we compare the use of
heterogeneous sensors that have different modes available. In
these experiments, the 100 locations are guaranteed to have
an object, so xi = 0 is not feasible. The prior probability
of object type for each location is πi(0) = [0 0.7 0.2 0.1]T
.
Table V shows the results of experiments with sensors that
have all sensing modes, versus an experiment where one
sensor has only a low-resolution mode and the other sensor
has both high and low-resolution modes. The table shows the
lower bounds predicted by the column generation algorithm,
to illustrate the change in performance expected from the
different architectural choices of sensors. The results indicate
that specialization of one sensor can lead to significant
degradation in performance due to inefficient use of its
resources.
The next set of results explore the effect of spatial distribu-
tion of sensors. We consider experiments where there are two
homogeneous sensors which have only partially-overlapping
coverage zones. (We define a ‘visibility group’ as a set
of sensors that have a common coverage zone). Table VI
gives bounds for different percentages of overlap. Note
that, even when there is only 20% overlap, the achievable
performance is similar to that of the 100% overlap case in
Table V, indicating that proper choice of strategies can lead
to efficient sharing of resources from different sensors and
equalizing their workload. The last set of results show the
performance of the RH algorithms for three homogeneous
sensors with partial overlap and different resource levels.
The visibility groups are graphically portrayed in Fig. 3.
Table VII presents the simulated cost values averaged over
100 simulations of the different RH algorithms and the lower
bounds. The results support our previous conclusions: when
a short horizon is used in the RH algorithm, and there are
sufficient resources, the strategy that uses the least resources
is preferred as it allows for replanning when new information
Overlap 60% Overlap 20%
MD MD
Horizon 3 1 5 10 1 5 10
Res[150, 150] 5.69 16.93 30.38 5.69 16.93 30.38
Res[150, 150] 4.61 16.11 25.98 4.61 16.11 25.92
Res[150, 150] 4.23 15.30 21.45 4.23 15.30 21.45
Horizon 4
Res[150, 150] 5.02 16.06 20.61 5.02 15.93 20.61
Res[150, 150] 3.94 9.46 12.66 3.94 9.46 12.66
Res[150, 150] 3.35 8.58 12.47 3.35 8.58 12.47
Horizon 6
Res[150, 150] 4.62 15.66 19.56 4.62 15.66 19.56
Res[150, 150] 2.92 8.25 10.91 2.94 8.24 10.91
Res[150, 150] 2.18 4.86 7.19 2.18 4.86 7.16
TABLE VI: Comparison of performance bounds with 2 homo-
geneous sensors with partial overlap in coverage. Only the bold
numbers are different.
Fig. 3: The 7 visibility groups for the 3 sensor experiment
indicating the number of locations in each group.
is available. If the RH algorithm uses a longer horizon, then
its performance approaches the theoretical lower bound, and
the difference in performance between the three approaches
for sampling the mixed strategy to obtain a pure strategy
is statistically insignificant. Our results suggest that RH
control with modest horizons of 2 or 3 sensor actions per
location can yield performance close to the best achievable
performance using mixed strategies. If shorter horizons are
used to reduce computation, then an approach that samples
mixed strategies by using the smallest amount of resources
is preferred. The results also show that, with proper SM,
geographically distributed sensors with limited visibility can
be coordinated to achieve equivalent performance to centrally
MD = 1 MD = 5 MD = 10
Horizon 3 str1 str2 str3 str1 str2 str3 str1 str2 str3
Res 100 5.26 6.08 5.57 17.23 17.44 16.79 22.02 21.93 22.16
Res 166 5.91 4.81 3.13 10.23 11.91 9.21 14.19 16.66 12.85
Res 233 3.30 3.75 3.43 10.15 9.32 5.88 14.49 12.55 8.21
Horizon 4
Res 100 5.32 5.58 5.93 17.26 16.88 16.17 21.92 20.94 21.35
Res 166 3.42 4.07 3.24 8.63 8.00 9.04 12.05 11.71 14.08
Res 233 3.65 3.07 3.29 5.27 7.14 5.38 8.25 10.08 7.90
Horizon 6
Res 100 5.79 5.51 5.98 17.13 17.90 17.44 22.03 20.56 22.17
Res 166 2.96 2.68 2.72 10.22 8.33 9.08 9.82 11.47 11.57
Res 233 1.52 2.00 1.70 4.81 4.13 4.24 5.64 7.20 5.11
Bounds
Res 100 4.62 15.66 19.56
Res 166 2.92 8.22 10.89
Res 233 2.18 4.87 7.18
TABLE VII: Simulation results for 3 homogeneous sensors with
partial overlap as shown in Fig. 3.
pooled resources.
In terms of the computational complexity of our RH al-
gorithms, the main bottleneck is the solution of the POMDP
problems. The LPs solved in the column generation approach
are small and are solved in minimal time. Solving the
POMDPs required to generate each column (one POMDP
for each visibility group in cases with partial sensor overlap)
is tractable by virtue of the hierarchical breakdown of the
SM problem into independent subproblems. It is also very
possible to accelerate these computations using multi-core
CPU or (NVIDIA) GPU processors, as the POMDPs are
highly parallelizable.
VI. CONCLUSIONS
In this paper, we introduce RH algorithms for near-
optimal, closed-loop SM with multi-modal, resource-
constrained, heterogeneous sensors. These RH algorithms
exploit a lower bound formulation developed in earlier work
that decomposes the SM optimization into a master problem,
which is addressed with linear-programming techniques, and
single location stochastic control problems that are solved
using POMDP algorithms. The resulting algorithm generates
mixed strategies for sensor plans, and the RH algorithms
convert these mixed strategies into sensor actions that satisfy
sensor resource constraints.
Our simulation results show that the RH algorithms
achieve performance close to that of the theoretical lower
bounds in [21]. The results also highlight the different
benefits of choosing a longer horizon for RH strategies and
alternative approaches at sampling the mixed strategy solu-
tions. Our simulations also show the effects of geographically
distributing sensors so that there is limited overlap in field
of view and the effects of specializing sensors by using a
restricted number of modes.
There are many interesting directions for extensions to
this work. First, one could consider the presence of object
dynamics, where objects can arrive at or depart from specific
locations. Second, one can also consider options for sensor
motion, where individual sensors can change locations and
thus observe new areas. Third, one could consider a set of
objects that have deterministic, but time-varying, visibility
profiles. Finally, one could consider approaches that reduce
the computational complexity of the resulting algorithms, ei-
ther through exploitation of parallel computing architectures
or through the use of off-line learning or other approximation
techniques.
REFERENCES
[1] B. Koopman, Search and Screening: General Principles with Histor-
ical Applications. Pergamon, New York NY, 1980.
[2] S. J. Benkoski, M. G. Monticino, and J. R. Weisinger, “A survey of
the search theory literature,” Naval Research Logistics, vol. 38, no. 4,
pp. 469–494, 1991.
[3] D. A. Casta˜n´on, “Optimal search strategies in dynamic hypothesis test-
ing,” Systems, Man and Cybernetics, IEEE Transactions on, vol. 25,
no. 7, pp. 1130–1138, Jul 1995.
[4] A. Wald, “On the efficient design of statistical investigations,” The
Annals of Mathematical Statistics, vol. 14, pp. 134–140, 1943.
[5] ——, “Sequential tests of statistical hypotheses,” The Annals of
Mathematical Statistics, vol. 16, no. 2, pp. 117–186, 1945. [Online].
Available: http://www.jstor.org/stable/2235829
[6] D. V. Lindley, “On a measure of the information provided by an
experiment,” Annals of Mathematical Statistics, vol. 27, pp. 986–1005,
1956.
[7] J. C. Kiefer, “Optimum experimental designs,” Journal of the Royal
Statistical Society Series B, vol. 21, pp. 272–319, 1959.
[8] H. Chernovv, Sequential Analysis and Optimal Design. SIAM,
Philadelphia, PA, 1972.
[9] V. V. Fedorov, Theory of Optimal Experiments. Academic Press, New
York, 1972.
[10] K. Kastella, “Discrimination gain to optimize detection and classifica-
tion,” Systems, Man and Cybernetics, Part A, IEEE Transactions on,
vol. 27, no. 1, pp. 112–116, Jan. 1997.
[11] C. Kreucher, K. Kastella, and I. Alfred O. Hero, “Sensor management
using an active sensing approach,” Signal Processing, vol. 85, no. 3,
pp. 607–624, 2005.
[12] M. Athans, “On the determination of optimal costly
measurement strategies for linear stochastic systems,” Automatica,
vol. 8, no. 4, pp. 397–412, 1972. [Online]. Avail-
able: http://www.sciencedirect.com/science/article/B6V21-47SV18C-
5/2/dc50e03b2ec82f34c592d4056c0da466
[13] V. Krishnamurthy and R. Evans, “Hidden markov model multiarm
bandits: a methodology for beam scheduling in multitarget tracking,”
Signal Processing, IEEE Transactions on, vol. 49, no. 12, pp. 2893–
2908, Dec 2001.
[14] R. Washburn, M. Schneider, and J. Fox, “Stochastic dynamic program-
ming based approaches to sensor resource management,” Information
Fusion, 2002. Proceedings of the Fifth International Conference on,
pp. 608–615 vol.1, 2002.
[15] J. C. Gittins, “Bandit processes and dynamic allocation indices,”
Journal of the Royal Statistical Society. Series B (Methodological),
vol. 41, no. 2, pp. 148–177, 1979. [Online]. Available:
http://www.jstor.org/stable/2985029
[16] W. Macready and I. Wolpert, D.H., “Bandit problems and the explo-
ration/exploitation tradeoff,” Evolutionary Computation, IEEE Trans-
actions on, vol. 2, no. 1, pp. 2–22, Apr. 1998.
[17] C. Kreucher and A. O. H. III, “Monte carlo methods for sensor
management in target tracking,” in IEEE Nonlinear Statistical Signal
Processing Workshop, 2006.
[18] E. Chong, C. Kreucher, and A. Hero, “Monte-carlo-based partially
observable markov decision process approximations for adaptive sens-
ing,” Discrete Event Systems, 2008. WODES 2008. 9th International
Workshop on, pp. 173–180, May 2008.
[19] D. A. Casta˜n´on, A. Hero, D. Cochran, and K. Kastella, Foundations
and Applications of Sensor Management, 1st ed. Springer, 2008,
ch. 1.
[20] D. A. Casta˜n´on, “Approximate dynamic programming for sensor
management,” in Proc 36th Conference on Decision and Control.
IEEE, 1997, pp. 1202–1207.
[21] ——, “Stochastic control bounds on sensor network performance,”
Decision and Control, 2005 and 2005 European Control Conference.
CDC-ECC ’05. 44th IEEE Conference on, pp. 4939–4944, Dec. 2005.
[22] P. C. Gilmore and R. E. Gomory, “A linear programming
approach to the cutting-stock problem,” Operations Research,
vol. 9, no. 6, pp. 849–859, 1961. [Online]. Available:
http://www.jstor.org/stable/167051
[23] G. B. Dantzig and P. Wolfe, “The decomposition algorithm for
linear programs,” Econometrica, vol. 29, no. 4, pp. 767–778, 1961.
[Online]. Available: http://www.jstor.org/stable/1911818
[24] K. A. Yost and A. R. Washburn, “The lp/pomdp marriage: Optimiza-
tion with imperfect information,” Naval Research Logistics, vol. 47,
no. 8, pp. 607–619, 2000.
[25] J. Pineau, G. Gordon, and S. Thrun, “Point-based value iteration: An
anytime algorithm for pomdps,” in International Joint Conference on
Artificial Intelligence (IJCAI), Aug. 2003, pp. 1025–1032.
[26] K. Kastella, “Discrimination gain for sensor management in multitar-
get detection and tracking,” in IEEE-SMC and IMACS Multiconference
CESA 1996, vol. 1, Jul. 1996, pp. 167–172.

More Related Content

What's hot

PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)neeraj7svp
 
Image compression based on
Image compression based onImage compression based on
Image compression based onijma
 
Global analysis of nonlinear dynamics
Global analysis of nonlinear dynamicsGlobal analysis of nonlinear dynamics
Global analysis of nonlinear dynamicsSpringer
 
New Data Association Technique for Target Tracking in Dense Clutter Environme...
New Data Association Technique for Target Tracking in Dense Clutter Environme...New Data Association Technique for Target Tracking in Dense Clutter Environme...
New Data Association Technique for Target Tracking in Dense Clutter Environme...CSCJournals
 
A Non Parametric Estimation Based Underwater Target Classifier
A Non Parametric Estimation Based Underwater Target ClassifierA Non Parametric Estimation Based Underwater Target Classifier
A Non Parametric Estimation Based Underwater Target ClassifierCSCJournals
 
Iee egold2010 presentazione_finale_veracini
Iee egold2010 presentazione_finale_veraciniIee egold2010 presentazione_finale_veracini
Iee egold2010 presentazione_finale_veracinigrssieee
 
APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...
APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...
APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...IJITCA Journal
 
OPTIMAL GLOBAL THRESHOLD ESTIMATION USING STATISTICAL CHANGE-POINT DETECTION
OPTIMAL GLOBAL THRESHOLD ESTIMATION USING STATISTICAL CHANGE-POINT DETECTIONOPTIMAL GLOBAL THRESHOLD ESTIMATION USING STATISTICAL CHANGE-POINT DETECTION
OPTIMAL GLOBAL THRESHOLD ESTIMATION USING STATISTICAL CHANGE-POINT DETECTIONsipij
 
A Novel Algorithm to Estimate Closely Spaced Source DOA
A Novel Algorithm to Estimate Closely Spaced Source DOA  A Novel Algorithm to Estimate Closely Spaced Source DOA
A Novel Algorithm to Estimate Closely Spaced Source DOA IJECEIAES
 
A Combination of Wavelet Artificial Neural Networks Integrated with Bootstrap...
A Combination of Wavelet Artificial Neural Networks Integrated with Bootstrap...A Combination of Wavelet Artificial Neural Networks Integrated with Bootstrap...
A Combination of Wavelet Artificial Neural Networks Integrated with Bootstrap...IJERA Editor
 
Array diagnosis using compressed sensing in near field
Array diagnosis using compressed sensing in near fieldArray diagnosis using compressed sensing in near field
Array diagnosis using compressed sensing in near fieldAlexander Decker
 
Paper id 26201483
Paper id 26201483Paper id 26201483
Paper id 26201483IJRAT
 
Satellite image compression technique
Satellite image compression techniqueSatellite image compression technique
Satellite image compression techniqueacijjournal
 
Multi objective predictive control a solution using metaheuristics
Multi objective predictive control  a solution using metaheuristicsMulti objective predictive control  a solution using metaheuristics
Multi objective predictive control a solution using metaheuristicsijcsit
 
MULTI-OBJECTIVE ENERGY EFFICIENT OPTIMIZATION ALGORITHM FOR COVERAGE CONTROL ...
MULTI-OBJECTIVE ENERGY EFFICIENT OPTIMIZATION ALGORITHM FOR COVERAGE CONTROL ...MULTI-OBJECTIVE ENERGY EFFICIENT OPTIMIZATION ALGORITHM FOR COVERAGE CONTROL ...
MULTI-OBJECTIVE ENERGY EFFICIENT OPTIMIZATION ALGORITHM FOR COVERAGE CONTROL ...ijcseit
 

What's hot (20)

PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
 
Image compression based on
Image compression based onImage compression based on
Image compression based on
 
Global analysis of nonlinear dynamics
Global analysis of nonlinear dynamicsGlobal analysis of nonlinear dynamics
Global analysis of nonlinear dynamics
 
New Data Association Technique for Target Tracking in Dense Clutter Environme...
New Data Association Technique for Target Tracking in Dense Clutter Environme...New Data Association Technique for Target Tracking in Dense Clutter Environme...
New Data Association Technique for Target Tracking in Dense Clutter Environme...
 
A Non Parametric Estimation Based Underwater Target Classifier
A Non Parametric Estimation Based Underwater Target ClassifierA Non Parametric Estimation Based Underwater Target Classifier
A Non Parametric Estimation Based Underwater Target Classifier
 
Paper 6 (azam zaka)
Paper 6 (azam zaka)Paper 6 (azam zaka)
Paper 6 (azam zaka)
 
Iee egold2010 presentazione_finale_veracini
Iee egold2010 presentazione_finale_veraciniIee egold2010 presentazione_finale_veracini
Iee egold2010 presentazione_finale_veracini
 
APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...
APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...
APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...
 
OPTIMAL GLOBAL THRESHOLD ESTIMATION USING STATISTICAL CHANGE-POINT DETECTION
OPTIMAL GLOBAL THRESHOLD ESTIMATION USING STATISTICAL CHANGE-POINT DETECTIONOPTIMAL GLOBAL THRESHOLD ESTIMATION USING STATISTICAL CHANGE-POINT DETECTION
OPTIMAL GLOBAL THRESHOLD ESTIMATION USING STATISTICAL CHANGE-POINT DETECTION
 
A Novel Algorithm to Estimate Closely Spaced Source DOA
A Novel Algorithm to Estimate Closely Spaced Source DOA  A Novel Algorithm to Estimate Closely Spaced Source DOA
A Novel Algorithm to Estimate Closely Spaced Source DOA
 
A Combination of Wavelet Artificial Neural Networks Integrated with Bootstrap...
A Combination of Wavelet Artificial Neural Networks Integrated with Bootstrap...A Combination of Wavelet Artificial Neural Networks Integrated with Bootstrap...
A Combination of Wavelet Artificial Neural Networks Integrated with Bootstrap...
 
SPACE TIME ADAPTIVE PROCESSING FOR CLUTTER SUPPRESSION IN RADAR USING SUBSPAC...
SPACE TIME ADAPTIVE PROCESSING FOR CLUTTER SUPPRESSION IN RADAR USING SUBSPAC...SPACE TIME ADAPTIVE PROCESSING FOR CLUTTER SUPPRESSION IN RADAR USING SUBSPAC...
SPACE TIME ADAPTIVE PROCESSING FOR CLUTTER SUPPRESSION IN RADAR USING SUBSPAC...
 
Cu24631635
Cu24631635Cu24631635
Cu24631635
 
Array diagnosis using compressed sensing in near field
Array diagnosis using compressed sensing in near fieldArray diagnosis using compressed sensing in near field
Array diagnosis using compressed sensing in near field
 
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
 
Paper id 26201483
Paper id 26201483Paper id 26201483
Paper id 26201483
 
Satellite image compression technique
Satellite image compression techniqueSatellite image compression technique
Satellite image compression technique
 
Multi objective predictive control a solution using metaheuristics
Multi objective predictive control  a solution using metaheuristicsMulti objective predictive control  a solution using metaheuristics
Multi objective predictive control a solution using metaheuristics
 
MULTI-OBJECTIVE ENERGY EFFICIENT OPTIMIZATION ALGORITHM FOR COVERAGE CONTROL ...
MULTI-OBJECTIVE ENERGY EFFICIENT OPTIMIZATION ALGORITHM FOR COVERAGE CONTROL ...MULTI-OBJECTIVE ENERGY EFFICIENT OPTIMIZATION ALGORITHM FOR COVERAGE CONTROL ...
MULTI-OBJECTIVE ENERGY EFFICIENT OPTIMIZATION ALGORITHM FOR COVERAGE CONTROL ...
 
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
 

Viewers also liked

«Предоставление субсидий в целях возмещения расходов по сертификации продукции»
«Предоставление субсидий в целях возмещения расходов по сертификации продукции»«Предоставление субсидий в целях возмещения расходов по сертификации продукции»
«Предоставление субсидий в целях возмещения расходов по сертификации продукции»BDA
 
Sun Mate Clip Strip Dispaly 2014
Sun Mate Clip Strip Dispaly 2014Sun Mate Clip Strip Dispaly 2014
Sun Mate Clip Strip Dispaly 2014Sandra Mandigo
 
feria de ciencias ppt 2015
 feria de ciencias ppt 2015 feria de ciencias ppt 2015
feria de ciencias ppt 2015Josefina Delia
 
презентация сибсельмаша фоминых
презентация сибсельмаша   фоминыхпрезентация сибсельмаша   фоминых
презентация сибсельмаша фоминыхBDA
 
Инновации в сертификации»
Инновации в сертификации»Инновации в сертификации»
Инновации в сертификации»BDA
 
Индустриальный парк Ямбург: перспективы
Индустриальный парк Ямбург: перспективыИндустриальный парк Ямбург: перспективы
Индустриальный парк Ямбург: перспективыAnna Katsay
 
Activitate de laborator 1 cls5
Activitate de laborator 1 cls5Activitate de laborator 1 cls5
Activitate de laborator 1 cls5Daniela Draghici
 
Preudocodigos & algoritmos (ciclos sino)
Preudocodigos & algoritmos (ciclos sino)Preudocodigos & algoritmos (ciclos sino)
Preudocodigos & algoritmos (ciclos sino)Eli Diaz
 
Explorando o sistema solar
Explorando o sistema solarExplorando o sistema solar
Explorando o sistema solarDulce Figueiredo
 

Viewers also liked (18)

«Предоставление субсидий в целях возмещения расходов по сертификации продукции»
«Предоставление субсидий в целях возмещения расходов по сертификации продукции»«Предоставление субсидий в целях возмещения расходов по сертификации продукции»
«Предоставление субсидий в целях возмещения расходов по сертификации продукции»
 
Refin Forum
Refin ForumRefin Forum
Refin Forum
 
Diego arango
Diego arangoDiego arango
Diego arango
 
Cincuenta libros y una frase
Cincuenta libros y una fraseCincuenta libros y una frase
Cincuenta libros y una frase
 
Sun Mate Clip Strip Dispaly 2014
Sun Mate Clip Strip Dispaly 2014Sun Mate Clip Strip Dispaly 2014
Sun Mate Clip Strip Dispaly 2014
 
feria de ciencias ppt 2015
 feria de ciencias ppt 2015 feria de ciencias ppt 2015
feria de ciencias ppt 2015
 
Redes sociais e marketing viral
Redes sociais e marketing viralRedes sociais e marketing viral
Redes sociais e marketing viral
 
sg247934
sg247934sg247934
sg247934
 
презентация сибсельмаша фоминых
презентация сибсельмаша   фоминыхпрезентация сибсельмаша   фоминых
презентация сибсельмаша фоминых
 
Avas
AvasAvas
Avas
 
Prve quetzalli hernandez
Prve quetzalli hernandezPrve quetzalli hernandez
Prve quetzalli hernandez
 
Инновации в сертификации»
Инновации в сертификации»Инновации в сертификации»
Инновации в сертификации»
 
Família. modelo humano ou divino.
Família. modelo humano ou divino.Família. modelo humano ou divino.
Família. modelo humano ou divino.
 
Индустриальный парк Ямбург: перспективы
Индустриальный парк Ямбург: перспективыИндустриальный парк Ямбург: перспективы
Индустриальный парк Ямбург: перспективы
 
Activitate de laborator 1 cls5
Activitate de laborator 1 cls5Activitate de laborator 1 cls5
Activitate de laborator 1 cls5
 
Preudocodigos & algoritmos (ciclos sino)
Preudocodigos & algoritmos (ciclos sino)Preudocodigos & algoritmos (ciclos sino)
Preudocodigos & algoritmos (ciclos sino)
 
Explorando o sistema solar
Explorando o sistema solarExplorando o sistema solar
Explorando o sistema solar
 
Estructura de los mapas conceptuales
Estructura de los mapas conceptualesEstructura de los mapas conceptuales
Estructura de los mapas conceptuales
 

Similar to Receding Horizon Stochastic Control Algorithms for Sensor Management ACC 2010

SENSITIVITY ANALYSIS IN A LIDARCAMERA CALIBRATION
SENSITIVITY ANALYSIS IN A LIDARCAMERA CALIBRATIONSENSITIVITY ANALYSIS IN A LIDARCAMERA CALIBRATION
SENSITIVITY ANALYSIS IN A LIDARCAMERA CALIBRATIONcscpconf
 
Sensitivity analysis in a lidar camera calibration
Sensitivity analysis in a lidar camera calibrationSensitivity analysis in a lidar camera calibration
Sensitivity analysis in a lidar camera calibrationcsandit
 
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...IJRES Journal
 
IOT-WSN: SURVEY ON POSITIONING TECHNIQUES
IOT-WSN: SURVEY ON POSITIONING TECHNIQUESIOT-WSN: SURVEY ON POSITIONING TECHNIQUES
IOT-WSN: SURVEY ON POSITIONING TECHNIQUESijassn
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentIJERD Editor
 
COMPUTATIONAL COMPLEXITY COMPARISON OF MULTI-SENSOR SINGLE TARGET DATA FUSION...
COMPUTATIONAL COMPLEXITY COMPARISON OF MULTI-SENSOR SINGLE TARGET DATA FUSION...COMPUTATIONAL COMPLEXITY COMPARISON OF MULTI-SENSOR SINGLE TARGET DATA FUSION...
COMPUTATIONAL COMPLEXITY COMPARISON OF MULTI-SENSOR SINGLE TARGET DATA FUSION...ijccmsjournal
 
An Efficient Approach for Multi-Target Tracking in Sensor Networks using Ant ...
An Efficient Approach for Multi-Target Tracking in Sensor Networks using Ant ...An Efficient Approach for Multi-Target Tracking in Sensor Networks using Ant ...
An Efficient Approach for Multi-Target Tracking in Sensor Networks using Ant ...ijsrd.com
 
COMPUTATIONAL COMPLEXITY COMPARISON OF MULTI-SENSOR SINGLE TARGET DATA FUSION...
COMPUTATIONAL COMPLEXITY COMPARISON OF MULTI-SENSOR SINGLE TARGET DATA FUSION...COMPUTATIONAL COMPLEXITY COMPARISON OF MULTI-SENSOR SINGLE TARGET DATA FUSION...
COMPUTATIONAL COMPLEXITY COMPARISON OF MULTI-SENSOR SINGLE TARGET DATA FUSION...ijccmsjournal
 
Computational Complexity Comparison Of Multi-Sensor Single Target Data Fusion...
Computational Complexity Comparison Of Multi-Sensor Single Target Data Fusion...Computational Complexity Comparison Of Multi-Sensor Single Target Data Fusion...
Computational Complexity Comparison Of Multi-Sensor Single Target Data Fusion...ijccmsjournal
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)inventionjournals
 
Non-life claims reserves using Dirichlet random environment
Non-life claims reserves using Dirichlet random environmentNon-life claims reserves using Dirichlet random environment
Non-life claims reserves using Dirichlet random environmentIJERA Editor
 
EDGE DETECTION IN SEGMENTED IMAGES THROUGH MEAN SHIFT ITERATIVE GRADIENT USIN...
EDGE DETECTION IN SEGMENTED IMAGES THROUGH MEAN SHIFT ITERATIVE GRADIENT USIN...EDGE DETECTION IN SEGMENTED IMAGES THROUGH MEAN SHIFT ITERATIVE GRADIENT USIN...
EDGE DETECTION IN SEGMENTED IMAGES THROUGH MEAN SHIFT ITERATIVE GRADIENT USIN...ijscmcj
 
Time of arrival based localization in wireless sensor networks a non linear ...
Time of arrival based localization in wireless sensor networks  a non linear ...Time of arrival based localization in wireless sensor networks  a non linear ...
Time of arrival based localization in wireless sensor networks a non linear ...sipij
 
Energy detection technique for
Energy detection technique forEnergy detection technique for
Energy detection technique forranjith kumar
 
Ill-posedness formulation of the emission source localization in the radio- d...
Ill-posedness formulation of the emission source localization in the radio- d...Ill-posedness formulation of the emission source localization in the radio- d...
Ill-posedness formulation of the emission source localization in the radio- d...Ahmed Ammar Rebai PhD
 
APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...
APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...
APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...IJITCA Journal
 
The Sample Average Approximation Method for Stochastic Programs with Integer ...
The Sample Average Approximation Method for Stochastic Programs with Integer ...The Sample Average Approximation Method for Stochastic Programs with Integer ...
The Sample Average Approximation Method for Stochastic Programs with Integer ...SSA KPI
 
Combination of Similarity Measures for Time Series Classification using Genet...
Combination of Similarity Measures for Time Series Classification using Genet...Combination of Similarity Measures for Time Series Classification using Genet...
Combination of Similarity Measures for Time Series Classification using Genet...Deepti Dohare
 
Cooperative Spectrum Sensing Technique Based on Blind Detection Method
Cooperative Spectrum Sensing Technique Based on Blind Detection MethodCooperative Spectrum Sensing Technique Based on Blind Detection Method
Cooperative Spectrum Sensing Technique Based on Blind Detection MethodINFOGAIN PUBLICATION
 

Similar to Receding Horizon Stochastic Control Algorithms for Sensor Management ACC 2010 (20)

SENSITIVITY ANALYSIS IN A LIDARCAMERA CALIBRATION
SENSITIVITY ANALYSIS IN A LIDARCAMERA CALIBRATIONSENSITIVITY ANALYSIS IN A LIDARCAMERA CALIBRATION
SENSITIVITY ANALYSIS IN A LIDARCAMERA CALIBRATION
 
Sensitivity analysis in a lidar camera calibration
Sensitivity analysis in a lidar camera calibrationSensitivity analysis in a lidar camera calibration
Sensitivity analysis in a lidar camera calibration
 
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
 
IOT-WSN: SURVEY ON POSITIONING TECHNIQUES
IOT-WSN: SURVEY ON POSITIONING TECHNIQUESIOT-WSN: SURVEY ON POSITIONING TECHNIQUES
IOT-WSN: SURVEY ON POSITIONING TECHNIQUES
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
COMPUTATIONAL COMPLEXITY COMPARISON OF MULTI-SENSOR SINGLE TARGET DATA FUSION...
COMPUTATIONAL COMPLEXITY COMPARISON OF MULTI-SENSOR SINGLE TARGET DATA FUSION...COMPUTATIONAL COMPLEXITY COMPARISON OF MULTI-SENSOR SINGLE TARGET DATA FUSION...
COMPUTATIONAL COMPLEXITY COMPARISON OF MULTI-SENSOR SINGLE TARGET DATA FUSION...
 
An Efficient Approach for Multi-Target Tracking in Sensor Networks using Ant ...
An Efficient Approach for Multi-Target Tracking in Sensor Networks using Ant ...An Efficient Approach for Multi-Target Tracking in Sensor Networks using Ant ...
An Efficient Approach for Multi-Target Tracking in Sensor Networks using Ant ...
 
COMPUTATIONAL COMPLEXITY COMPARISON OF MULTI-SENSOR SINGLE TARGET DATA FUSION...
COMPUTATIONAL COMPLEXITY COMPARISON OF MULTI-SENSOR SINGLE TARGET DATA FUSION...COMPUTATIONAL COMPLEXITY COMPARISON OF MULTI-SENSOR SINGLE TARGET DATA FUSION...
COMPUTATIONAL COMPLEXITY COMPARISON OF MULTI-SENSOR SINGLE TARGET DATA FUSION...
 
Computational Complexity Comparison Of Multi-Sensor Single Target Data Fusion...
Computational Complexity Comparison Of Multi-Sensor Single Target Data Fusion...Computational Complexity Comparison Of Multi-Sensor Single Target Data Fusion...
Computational Complexity Comparison Of Multi-Sensor Single Target Data Fusion...
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
Non-life claims reserves using Dirichlet random environment
Non-life claims reserves using Dirichlet random environmentNon-life claims reserves using Dirichlet random environment
Non-life claims reserves using Dirichlet random environment
 
EDGE DETECTION IN SEGMENTED IMAGES THROUGH MEAN SHIFT ITERATIVE GRADIENT USIN...
EDGE DETECTION IN SEGMENTED IMAGES THROUGH MEAN SHIFT ITERATIVE GRADIENT USIN...EDGE DETECTION IN SEGMENTED IMAGES THROUGH MEAN SHIFT ITERATIVE GRADIENT USIN...
EDGE DETECTION IN SEGMENTED IMAGES THROUGH MEAN SHIFT ITERATIVE GRADIENT USIN...
 
Time of arrival based localization in wireless sensor networks a non linear ...
Time of arrival based localization in wireless sensor networks  a non linear ...Time of arrival based localization in wireless sensor networks  a non linear ...
Time of arrival based localization in wireless sensor networks a non linear ...
 
Energy detection technique for
Energy detection technique forEnergy detection technique for
Energy detection technique for
 
Ill-posedness formulation of the emission source localization in the radio- d...
Ill-posedness formulation of the emission source localization in the radio- d...Ill-posedness formulation of the emission source localization in the radio- d...
Ill-posedness formulation of the emission source localization in the radio- d...
 
08039246
0803924608039246
08039246
 
APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...
APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...
APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...
 
The Sample Average Approximation Method for Stochastic Programs with Integer ...
The Sample Average Approximation Method for Stochastic Programs with Integer ...The Sample Average Approximation Method for Stochastic Programs with Integer ...
The Sample Average Approximation Method for Stochastic Programs with Integer ...
 
Combination of Similarity Measures for Time Series Classification using Genet...
Combination of Similarity Measures for Time Series Classification using Genet...Combination of Similarity Measures for Time Series Classification using Genet...
Combination of Similarity Measures for Time Series Classification using Genet...
 
Cooperative Spectrum Sensing Technique Based on Blind Detection Method
Cooperative Spectrum Sensing Technique Based on Blind Detection MethodCooperative Spectrum Sensing Technique Based on Blind Detection Method
Cooperative Spectrum Sensing Technique Based on Blind Detection Method
 

Receding Horizon Stochastic Control Algorithms for Sensor Management ACC 2010

  • 1. Receding Horizon Stochastic Control Algorithms for Sensor Management Darin Hitchings and David A. Casta˜n´on Abstract— The increasing use of smart sensors that can dynamically adapt their observations has created a need for algorithms to control the information acquisition process. While such problems can usually be formulated as stochastic control problems, the resulting optimization problems are complex and difficult to solve in real-time applications. In this paper, we consider sensor management problems for sensors that are trying to find and classify objects. We propose alternative approaches for sensor management based on receding horizon control using a stochastic control approximation to the sensor management problem. This approximation can be solved using combinations of linear programming and stochastic control techniques for partially observed Markov decision problems in a hierarchical manner. We explore the performance of our proposed receding horizon algorithms in simulations using heterogeneous sensors, and show that their performance is close to that of a theoretical lower bound. Our results also suggest that a modest horizon is sufficient to achieve near- optimal performance. I. INTRODUCTION Recent advances in embedded computing have introduced a new generation of sensors that have the capability of adapting their sensing dynamically in response to collected information. For instance, unmanned aerial vehicles (UAVs) have multiple sensors —radar and electro-optical cameras —which can dynamically change their fields of view and measurement modes. These advances have created a need for a commensurate theory of sensor management (SM) and control to ensure that relevant information is collected for the mission of the sensor system given the available sensor resources. There are numerous applications involving surveillance, diagnosis and fault identification that require such control. One of the earliest examples of SM arose in the context of Search, with applications to anti-submarine warfare [1]. Sensors had the ability to move spatially and allocate their search effort over time and space. Most of the early work on search theory focused on open-loop search plans rather than feedback control of search trajectories [2]. Extensions of search theory to problems requiring adaptive feedback strategies have been developed in some restricted contexts [3]. Adaptive SM has its roots in the field of statistics, where Bayesian experiment design was used to configure subse- quent experiments based on observed information. Wald [4], [5] considered sequential hypothesis testing with costly ob- servations. Lindley [6] and Kiefer [7] expanded the concepts to include variations in potential measurements. Chernoff [8] This work was supported by a grant from AFOSR. The authors are with the Dept of Electrical & Computer Eng., Boston University, dhitchin@bu.edu, dac@bu.edu and Fedorov [9] used Cramer-Rao bounds for selecting se- quences of measurements for nonlinear regression problems. Most of the strategies proposed for Bayesian experiment design involve single-step optimization criteria, resulting in greedy or myopic strategies that optimize bounds on the expected performance after the next experiment. Other approaches to adaptive SM using single-stage optimization have been proposed using alternative information theoretic measures [10], [11]. Feedback control approaches to SM that consider opti- mization over time have also been explored. Athans [12] considered a two-point boundary value approach to control- ling the error covariance in linear estimators by choosing the measurement matrices. Multi-armed bandit formulations have been used to control individual sensors in applications related to target tracking [13], [14]. Such approaches are restricted to single-sensor control, selecting among individual subproblems to measure, in order to obtain solutions using Gittins indices [15], [16]. Approximate dynamic program- ming (DP) techniques have also been proposed using ap- proximations to the optimal cost-to-go based on information theoretic measures evaluated using Monte Carlo techniques [17], [18]. A good overview of these techniques is available in [19]. The above approaches for dynamic feedback control are limited in application to problems with a small number of sensor-action choices and simple constraints because the algorithms must enumerate and evaluate the various control actions. In [20], combinatorial optimization techniques are integrated into a DP formulation to obtain approximate stochastic dynamic programming (SDP) algorithms that ex- tend to large numbers of sensor actions. Subsequent work in [21] derived an SDP formulation using partially ob- served Markov decision processes (POMDPs) and obtained a computable lower bound to the achievable performance of feedback strategies for complex multi-sensor management problems. The lower bound was obtained by a convex relaxation of the original combinatorial POMDP using mixed strategies and averaged constraints. However, the results in [21] do not specify algorithms with performance close to the lower bound. In this paper, we develop and implement algorithms for the efficient computation of adaptive SM strategies for complex problems involving multiple sensors with different observa- tion modes and large numbers of objects. The algorithms are based on using the lower bound formulation from [21] as an objective in a receding horizon (RH) optimization problem and developing techniques for obtaining feasible decisions from the mixed strategy solutions. The resulting
  • 2. algorithms are scalable to large numbers of tasks, and suitable for real-time SM. We also extend the model of [21] to incorporate search actions in addition to classification. We evaluate alternative approaches for obtaining feasible decision strategies, and evaluate the resulting performance of the RH algorithms using multi-sensor simulations. Our sim- ulation results demonstrate that our RH algorithms achieve performance comparable to the predicted lower bound of [21] and shed insight into the relative value of different strategies for partitioning sensor resources either geographically or by sensor specialization. The rest of this paper is organized as follows: Section II describes the formulation of the stochastic SM problem. Section III provides an example of the column generation technique for generating mixed strategies for SM. Section IV discusses how we create feasible, sequenced sensor schedules from these mixed strategies. Section V documents our sim- ulation results for various scenarios. Section VI summarizes our results and discusses areas for future work. II. PROBLEM FORMULATION AND BACKGROUND The problem formulation is an extension of the POMDP formulation presented in [21]. Assume that there are a finite number of locations 1, . . . , N, each of which may have an object with a given type, or which may be empty. Assume that there is a set of S sensors, each of which has multiple sensor modes, and that each sensor can observe one and only one location at each time with a selected mode. Let xi ∈ {0, 1, . . . , D} denote the state of location i, where xi = 0 if location i is unoccupied, and otherwise xi = k > 0 indicates location i has an object of type k. Let πi(0) ∈ D+1 be a discrete probability distribution over the possible states for the ith location for i = 1, . . . , N where D ≥ 2. Assume additionally that the random variables xi, i = 1, . . . , N are mutually independent. There are s = 1, . . . , S sensors, each of which has m = 1, . . . , Ms possible modes of observation. We assume there is a series of T discrete decision stages where sensors can select which location to measure, where T is large enough so that all of the sensors can use their available resources. At each stage, each sensor can choose to employ one and only of its modes on a single location to collect a noisy measurement concerning the state xi at that location. Each sensor s has a limited set of locations that it can observe, denoted by Os ⊆ {1, . . . , N}. A sensor action by sensor s at stage t is a pair: us(t) = (is(t), ms(t)) (1) consisting of a location to observe, is ∈ Os, and a mode for that observation, ms. Sensor measurements are modeled as belonging to a finite set y ∈ {1, . . . , Ls}. The likelihood of the measured value is assumed to depend on the sensor s, sensor mode m, location i and on the true state at the location xi but not on the states of other locations. Denote this likelihood as P(y|xi, i, s, m). We assume that this likelihood is time- invariant, and that the random measurements yi,s,m(t) are conditionally independent of other measurements yj,σ,n(τ) given the location states xi, xj for all sensor modes m, n provided i = j or τ = t. Each sensor has a limited quantity of Ri resources avail- able for measurements. Associated with the use of mode m by sensor s on location i is a resource cost rs(us(t)) to use this mode, representing power or some other type of resource required to use this mode from this sensor. T −1 t=0 rs(us(t)) ≤ Rs ∀ s ∈ [1, . . . , S] (2) This is a hard constraint for each realization of observations and decisions. Let I(t) denote the sequence of past sensing actions and measurement outcomes up to and including stage t − 1: I(t) = {(us(k), ys(k)), s = 1, . . . , S; k = 0, . . . , t − 1} Under the assumption of conditional independence of mea- surements and independence of individual states at each lo- cation, the conditional probability of (x1, . . . , xN ) given I(t) can be factored as a product of belief states at each location. Denote the belief state at location i as πi(t) = p(xi|I(t)). When a sensor measurement is taken, the belief state is up- dated according to Bayes’ Rule. A measurement of location i with the sensor-mode combination us(t) = (i, m) at stage t that generates observable y(t) updates the belief vector as: πi(t + 1) = diag{P(y(t)|xi = j, i, s, m)}πi(t) 1T diag{P(y(t)|xi = j, i, s, m)}πi(t) (3) where 1 is the D + 1 dimensional vector of all ones. Eq. (3) captures the relevant information dynamics that SM controls. In addition to information dynamics, there are resource dynamics that characterize the available resources at stage t. The dynamics for sensor s are given as: Rs(t + 1) = Rs(t) − rs(us(t)); Rs(0) = Rs (4) These dynamics constrain the admissible decisions by a sensor, in that it can only use modes that do not use more resources than are available. Given the final information I(T), the quality of the information collected is measured by making an estimate of the state of each location i given the available information. Denote these estimates as vi i = 1, . . . , N. The Bayes’ cost of selecting estimate vi when the true state is xi is denoted as c(xi, vi) ∈ with c(xi, vi) ≥ 0. The objective of the SM, stochastic control formulation is to minimize: J = N i=1 E[c(xi, vi)] (5) by selecting adaptive sensor control policies and final esti- mates subject to the dynamics of Eq. (3) and the constraints of Eq. (4) and Eq. (2). The results in [21] provide an SDP algorithm to solve the above problem, with cost-to-go at stage t depending on the joint belief state π(t) = [π1(t), . . . , πN (t)] and the residual resource state R(t) = [R1(t), . . . , RS(t)]. Because
  • 3. of this dependency, the cost-to-go does not decouple over locations. This leads to a very large POMDP problem with combinatorially many actions and an underlying belief state of dimension (D + 1)N that is computationally intractable unless there are very few locations. In [21], the above stochastic control problem was replaced with a simpler problem that provided a lower bound on the optimal cost, by expanding the set of admissible strategies, replacing the constraints of Eq. (2) by the “soft” constraints: E[ T −1 t=0 rs(us(t))] ≤ Rs ∀ s ∈ [1 . . . S] (6) To solve the simpler problem, [21] proposed incorporation of the soft constraints in Eq. (6) into the objective function using Lagrange multipliers λs for each sensor s. The augmented objective function is: ¯Jλ = J + T −1 t=0 S s=1 λs E[rs(us(t))] − S s=1 λsRs (7) A key result in [21] was that when the optimization of Eq. (7) was done over mixed strategies for given values of Lagrange multipliers λs, the stochastic control problem decoupled into independent POMDPs for each location, and the optimiza- tion could be performed using feedback strategies for each location i that depended only on the information collected for that location, Ii(t). These POMDPs have an underlying information state-space of dimension D + 1, corresponding to the number of possible states at a single location, and can be solved efficiently. Because the measurements and possible sensor actions are finite-valued, the set of possible SM strategies Γ is also finite. Let Q(Γ) denote the set of mixed strategies that assign probability q(γ) to the choice of strategy γ ∈ Γ. The problem of finding the optimal mixed strategies can be written as: min q∈Q(Γ) γ∈Γ q(γ) E γ J(γ) (8) γ∈Γ q(γ) E γ [ N i=1 T −1 t=0 rs(us(t))] ≤ Rs s ∈ [1, . . . , S] (9) γ∈Γ q(γ) = 1 (10) where we have one constraint for each of the S sensor re- source pools and an additional simplex constraint in Eq. (10) which ensures that q ∈ Q(Γ) forms a valid probability distribution. This is a large linear program (LP), where the number of possible variables are the strategies in Γ. However, the total number of constraints is S+1, which establishes that optimal solutions of this LP are mixtures of no more than S + 1 strategies. Thus, one can use a column generation approach [22], [23], [24] to quickly identify an optimal mixed strategy. In this approach, one solves Eq. (8) and Eq. (9) restricting the mixed strategies to be mixtures of a small subset Γ ⊂ Γ. The solution of the restricted LP has optimal dual prices λs, s = 1, . . . , S. Using these prices, one can determine a corresponding optimal pure strategy by minimizing: Jλ = N i=1 E[c(xi, vi)] + T −1 t=0 S s=1 λs E[rs(us(t))] − S s=1 λsRs (11) which the results in [21] show can be decoupled into N independent optimization problems, one for each location. Each of these problems is solved as a POMDP using standard algorithms such as point-based value iteration (PBVI) [25] to determine the best pure strategy γ1 for these prices. If the best pure strategy γ1 is already in the set Γ , then the solution of Eq. (8) and Eq. (9) restricted to Q(Γ ) is an optimal mixed strategy over all of Q(Γ). Otherwise, the strategy γ1 is added to the admissible set Γ , and the iteration is repeated. The result is a set of mixed strategies that achieve a performance level that is a lower bound on the original SM optimization problem with hard constraints. III. COLUMN GENERATION AND POMDP SUBPROBLEM EXAMPLE We present an example to illustrate the column generation algorithm and POMDP algorithms discussed previously. In this simple example we consider 100 objects (N=100), 2 pos- sible object types (D=2) with X = {non-military vehicle, military vehicle}, and 2 sensors that each have one mode (S = 2 and Ms = 1 ∀ s ∈ {1, 2}). Sensor s actions have resource costs: rs, where r1 = 1, r2 = 2. Sensors return 2 possible observation values, corresponding to binary object classifications, with likelihoods: P(yi,1,1(t)|xi, u1(t)) P(yi,2,1(t)|xi, u2(t)) 0.90 0.10 0.10 0.90 0.92 0.08 0.08 0.92 where the (j, k) matrix entry denotes the likelihood that y = j if xi = k. The second sensor has 2% better performance than the first sensor but requires twice as many resources to use. Each sensor has Rs = 100 units of resources, and can view each location. Each of the 100 locations has a uniform prior of πi = [0.5 0.5]T ∀ i. For the performance objective, we use c(xi, vi) = 1 if xi = vi, and 0 otherwise, where the cost is 1 unit for a classification error. Table I demonstrates the column generation solution pro- cess. The first three columns are initialized by guessing val- ues of resource prices and obtaining the POMDP solutions, yielding expected costs and expected resource use for each sensor at those resource prices. A small LP is solved to obtain the optimal mixture of the first three strategies γ1, . . . , γ3, and a corresponding set of dual prices. These dual prices are used in the POMDP solver to generate the fourth column γ4, which yields a strategy that is different from that of the first 3 columns. The LP is re-solved for mixtures of the first 4 strategies, yielding new resource prices that are used to generate the next column. This process continues until the solution using the prices after 7 columns yields a strategy that was already represented in a previous column, terminating
  • 4. γ1 γ2 γ3 γ4 γ5 γ6 γ7 min 50.0 2.80 2.44 1.818 8 10 6.22 R1 0 218 200 0 0 100 150 ≤ 100 R2 0 0 36 800 200 0 18 ≤ 100 Simplex 1 1 1 1 1 1 1 = 1 Optimal cost - - 26.22 21.28 7.35 5.95 5.95 Mixture weights 0 0.424 0 0 0.500 0.076 0 λc 1 1.0e15 0.024 0.010 0.238 0.227 0.217 0.061 λc 2 1.0e15 0.025 0.015 0 0.060 0.210 0.041 TABLE I: Column generation example with 100 objects. The tableau is displayed in its final form after convergence. λc s describe the lambda trajectories up until convergence. R1 and R2 are resource constraints. γ1 is a ‘do-nothing’ strategy. Bold numbers represent useful solution data. Fig. 1: The 3 policy graphs that correspond to columns 2, 5 and 6 of Table I. The frequency of choosing each of these 3 strategies is controlled by the relative proportion of the mixture weight qc ∈ (0..1) with c ∈ {2, 5, 6}. the algorithm. The optimal mixture combines the strategies of the second, fifth and sixth columns. When the master prob- lem converges, the optimal cost, J∗ , for the mixed strategy is 5.95 units. The resulting policy graphs are illustrated in Fig. 1, where branches up indicate measurements y = 1 (‘non-military’) and down y = 2 (‘military’). The red and green nodes denote the final decision, vi, for a location. Note that the strategy of column 5 uses only the second sensor, whereas the strategies of columns 2 and 6 use only the first sensor. The mixed strategy allows the soft resource constraints to be satisfied with equality. Table I also shows the resource costs and expected classification performance of each column. The example illustrates some of the issues associated with the use of soft constraints in the optimization: the resulting solution does not lead to SM strategies that will always satisfy the hard constraints Eq. (2). We address this issue in the subsequent section. IV. RECEDING HORIZON CONTROL The column generation algorithm described previously solves the approximate SM problem with “soft” constraints in terms of mixed strategies that, on average, satisfy the resource constraints. However, for control purposes, one must select actual SM actions that satisfy the hard constraints Eq. (2). Another issue is that the solutions of the decoupled POMDPs provide individual sensor schedules for each lo- cation that must be interleaved into a single coherent sensor schedule. Furthermore, exact solution of the small decoupled POMDPs for each set of prices can be time consuming, making the resulting algorithm unsuited for real-time SM. To address this, we will explore a set of RH algorithms that will convert the mixed strategy solutions discussed in the previous section to actions that satisfy the hard constraints, and limit the computational complexity of the resulting algorithm. The RH algorithms have adjustable parameters whose effects we will explore in simulation. The RH algorithms start at stage t with an information state/resource state pair, consisting of available information about each location i = 1, . . . , N represented by the condi- tional probability vector πi(t) and available sensor resources Rs(t), s = 1, . . . , S. The first step in the algorithms is to solve the SM problem of Eq. (5) starting at stage t to final stage T subject to soft constraints Eq. (6), using the hierarchical column generation / POMDP algorithms to get a set of mixed strategies. We introduce a parameter corre- sponding to the maximum number of sensing actions per location to control the resulting computational complexity of the POMDP algorithms. The second step is to select sensing actions to implement at the current stage t from the mixed strategies. These strategies are mixtures of at most S + 1 pure strategies, with associated probabilistic weights. We explore three ap- proaches for selecting sensing actions: • str1: Select the pure strategy with maximum probability. • str2: Randomly select a pure strategy per location according to the optimal mixture probabilities. • str3: Select the pure strategy with positive probability that minimizes the expected sensor resource use (and thus leaves resources for use in future stages.) Once pure strategies for each location have been selected, the third step is to select a sensing action to be implemented for each location. Our approach is to select the first sensing action of the pure strategy for each location. Note that there may not be enough sensor resources to execute the selected actions, particularly in the case where the pure strategy with maximum probability is selected. To address this, we rank sensing actions by their expected entropy gain [26], which is the expected reduction in entropy of the conditional probability distribution, πi(t), based on the anticipated measurement value. We schedule sensor actions in order of decreasing expected entropy gain, and perform those actions at stage t that have enough sensor resources to be feasible. The measurements collected from the scheduled actions are used to update the information states πi(t + 1) using Eq. (3). The resources used by the actions are eliminated from the available resources to compute Rs(t + 1) using Eq. (4). The RH algorithm is then executed from the new information state/resource state condition.
  • 5. Search Low-res Hi-res y1 y2 y3 y1 y2 y3 y1 y2 y3 empty 0.92 0.04 0.04 0.95 0.03 0.02 0.95 0.03 0.02 car 0.08 0.46 0.46 0.05 0.85 0.10 0.02 0.95 0.03 truck 0.08 0.46 0.46 0.05 0.10 0.85 0.02 0.90 0.08 military 0.08 0.46 0.46 0.05 0.10 0.85 0.02 0.03 0.95 TABLE II: Observation likelihoods for different sensor modes with the observation symbols y1, y2 and y3. Fig. 2: Illustration of scenario with two partially-overlapping sensors. V. SIMULATION RESULTS In order to evaluate the relative performance of the differ- ent RH algorithms, we performed a set of experiments with simulations. In these experiments, there were 100 locations, each of which could be empty, or have objects of three types, so the possible states of location i were xi ∈ {0, 1, 2, 3} where type 1 represents cars, type 2 trucks, and type 3 military vehicles. Sensors can have several modes: a search mode, a low resolution mode and a high resolution mode. The search mode primarily detects the presence of objects; the low resolution mode can identify cars, but confuses the other two types, whereas the high resolution mode can separate the three types. Observations are modeled as having three possible values. The search mode consumes 0.25 units of resources, whereas the low-resolution mode consumes 1 unit and the high resolution mode 5 units, uniformly for each sensor and location. Table II shows the likelihood functions that were used in the simulations. Initially, each location has a state with one of two prior probability distributions: πi(0) = [0.10 0.60 0.20 0.10]T , i ∈ [1, . . . , 10] or πi(0) = [0.80 0.12 0.06 0.02]T , i ∈ [11, . . . , 100]. Thus, the first 10 locations are likely to contain objects, whereas the other 90 locations are likely to be empty. When multiple sensors are present, they may share some locations in common, and have locations that can only be seen by a specific sensor, as illustrated in Fig. 2. The cost function used in the experiments, c(xi, vi) is shown in Table III. The parameter MD represents the cost of a missed detection, and will be varied in the experiments. Table IV shows simulation results for a search and clas- sify scenario involving 2 identical sensors (with the same xi; vi empty car truck military empty 0 1 1 1 car 1 0 0 1 truck 1 0 0 1 military MD MD MD 0 TABLE III: Decision costs MD = 1 MD = 5 MD = 10 Hor. 3 str1 str2 str3 str1 str2 str3 str1 str2 str3 Res 30 3.64 3.85 3.85 11.82 12.88 12.23 15.28 14.57 14.50 Res 50 2.40 2.80 2.43 6.97 6.93 7.84 10.98 9.99 10.45 Res 70 2.45 2.32 1.88 3.44 3.99 4.04 6.14 6.48 5.10 Hor. 4 Res 30 3.58 3.46 3.52 12.28 12.62 11.90 14.48 15.91 15.59 Res 50 2.37 2.21 2.33 7.44 7.44 7.20 9.94 9.28 10.65 Res 70 1.68 1.33 1.60 3.59 3.57 3.62 6.30 5.18 5.86 Hor. 6 Res 30 3.51 3.44 3.73 11.17 11.85 12.09 15.17 14.99 13.6 Res 50 2.28 2.11 2.31 7.29 8.02 7.70 10.67 10.47 11.25 Res 70 1.43 1.38 1.44 3.60 3.73 3.84 4.91 5.09 5.94 Bounds Res 30 3.35 11.50 13.85 Res 50 2.21 6.27 9.40 Res 70 1.32 2.95 4.96 TABLE IV: Simulation results for 2 homogeneous, multi-modal sensors. str1: select the most likely pure strategy for all locations; str2: randomize the choice of strategy per location according to mixture probabilities; str3: select the strategy that yields the least expected use of resources for all locations. visibility), evaluating different versions of the RH control algorithms and with different resource levels. The variable “Horizon” is the total number of sensor actions allowed per location plus one additional action for estimating the location content. The table shows results for different resource levels per sensor, from 30 to 70 units, and displays the lower bound performance computed. The missed detection cost MD is varied from 1 to 10. The results shown in Table IV represent the average of 100 Monte Carlo simulation runs of the RH algorithms. The results show that using a longer horizon in the planning improves performance minimally, so that using a RH replanning approach with a short horizon can be used to reduce computation time with limited performance degradation. The results also show that the different RH algorithms have performance close to the optimal lower bound in most cases, with the exception being the case of MD = 5 with 70 units of sensing resources per sensor. For a horizon 6 plan, the longest horizon studied, the simulation performance is close to that of the associated bound. In terms of which strategy is preferable for converting the mixed strategies, the results of Table IV are unclear. For short planning horizons in the RH algorithms, the preferred strategy appears to be to use the least resources (str3), thus allowing for improvement from replanning. For the longer horizons, there was no significant difference in performance among the three strategies. To illustrate the computational requirements of this scenario (4 states, 3 observations, 2 sen- sors (6 actions), full sensor-overlap), the number of columns generated by the column generation algorithm to compute a set of mixed strategies was on the order of 10-20 columns for the horizon 6 algorithms, which takes about 60 sec on a 2.2 Ghz, single-core, Intel P4 machine under linux using C code in ‘Debug’ mode (with 1000 belief-points for PBVI). Memory usage without optimizations is around 3 MB. There are typically 4-5 planning sessions in a simulation. Profiling indicates that roughly 80% of the computing time
  • 6. Homogeneous Heterogeneous MD MD Horizon 3 1 5 10 1 5 10 Res[150, 150] 5.69 16.93 30.38 6.34 18.15 31.23 Res[250, 250] 4.61 16.11 25.92 5.53 16.77 29.32 Res[350, 350] 4.23 15.30 21.45 5.12 16.41 27.41 Horizon 4 Res[150, 150] 5.02 16.06 20.61 5.64 16.85 20.61 Res[250, 250] 3.94 9.46 12.66 4.58 12.05 14.87 Res[350, 350] 3.35 8.58 12.47 4.28 9.41 12.65 Horizon 6 Res[150, 150] 4.62 15.66 19.56 5.27 16.20 19.56 Res[250, 250] 2.92 8.24 10.91 3.32 8.83 11.35 Res[350, 350] 2.18 4.86 7.15 2.66 6.63 9.17 TABLE V: Comparison of lower-bounds for 2 homogeneous, bi- modal sensors vs. 2 heterogeneous sensors. goes towards value backups in the PBVI routine and 15% goes towards (recursively) tracing decision-trees in order to back out (deduce) the measurement costs from hyperplane costs. (Every node in a decision-tree / policy-graph (for each pure strategy) has a corresponding hyperplane with a vector of cost coefficients that represent classification + measurement costs). In the next set of experiments, we compare the use of heterogeneous sensors that have different modes available. In these experiments, the 100 locations are guaranteed to have an object, so xi = 0 is not feasible. The prior probability of object type for each location is πi(0) = [0 0.7 0.2 0.1]T . Table V shows the results of experiments with sensors that have all sensing modes, versus an experiment where one sensor has only a low-resolution mode and the other sensor has both high and low-resolution modes. The table shows the lower bounds predicted by the column generation algorithm, to illustrate the change in performance expected from the different architectural choices of sensors. The results indicate that specialization of one sensor can lead to significant degradation in performance due to inefficient use of its resources. The next set of results explore the effect of spatial distribu- tion of sensors. We consider experiments where there are two homogeneous sensors which have only partially-overlapping coverage zones. (We define a ‘visibility group’ as a set of sensors that have a common coverage zone). Table VI gives bounds for different percentages of overlap. Note that, even when there is only 20% overlap, the achievable performance is similar to that of the 100% overlap case in Table V, indicating that proper choice of strategies can lead to efficient sharing of resources from different sensors and equalizing their workload. The last set of results show the performance of the RH algorithms for three homogeneous sensors with partial overlap and different resource levels. The visibility groups are graphically portrayed in Fig. 3. Table VII presents the simulated cost values averaged over 100 simulations of the different RH algorithms and the lower bounds. The results support our previous conclusions: when a short horizon is used in the RH algorithm, and there are sufficient resources, the strategy that uses the least resources is preferred as it allows for replanning when new information Overlap 60% Overlap 20% MD MD Horizon 3 1 5 10 1 5 10 Res[150, 150] 5.69 16.93 30.38 5.69 16.93 30.38 Res[150, 150] 4.61 16.11 25.98 4.61 16.11 25.92 Res[150, 150] 4.23 15.30 21.45 4.23 15.30 21.45 Horizon 4 Res[150, 150] 5.02 16.06 20.61 5.02 15.93 20.61 Res[150, 150] 3.94 9.46 12.66 3.94 9.46 12.66 Res[150, 150] 3.35 8.58 12.47 3.35 8.58 12.47 Horizon 6 Res[150, 150] 4.62 15.66 19.56 4.62 15.66 19.56 Res[150, 150] 2.92 8.25 10.91 2.94 8.24 10.91 Res[150, 150] 2.18 4.86 7.19 2.18 4.86 7.16 TABLE VI: Comparison of performance bounds with 2 homo- geneous sensors with partial overlap in coverage. Only the bold numbers are different. Fig. 3: The 7 visibility groups for the 3 sensor experiment indicating the number of locations in each group. is available. If the RH algorithm uses a longer horizon, then its performance approaches the theoretical lower bound, and the difference in performance between the three approaches for sampling the mixed strategy to obtain a pure strategy is statistically insignificant. Our results suggest that RH control with modest horizons of 2 or 3 sensor actions per location can yield performance close to the best achievable performance using mixed strategies. If shorter horizons are used to reduce computation, then an approach that samples mixed strategies by using the smallest amount of resources is preferred. The results also show that, with proper SM, geographically distributed sensors with limited visibility can be coordinated to achieve equivalent performance to centrally MD = 1 MD = 5 MD = 10 Horizon 3 str1 str2 str3 str1 str2 str3 str1 str2 str3 Res 100 5.26 6.08 5.57 17.23 17.44 16.79 22.02 21.93 22.16 Res 166 5.91 4.81 3.13 10.23 11.91 9.21 14.19 16.66 12.85 Res 233 3.30 3.75 3.43 10.15 9.32 5.88 14.49 12.55 8.21 Horizon 4 Res 100 5.32 5.58 5.93 17.26 16.88 16.17 21.92 20.94 21.35 Res 166 3.42 4.07 3.24 8.63 8.00 9.04 12.05 11.71 14.08 Res 233 3.65 3.07 3.29 5.27 7.14 5.38 8.25 10.08 7.90 Horizon 6 Res 100 5.79 5.51 5.98 17.13 17.90 17.44 22.03 20.56 22.17 Res 166 2.96 2.68 2.72 10.22 8.33 9.08 9.82 11.47 11.57 Res 233 1.52 2.00 1.70 4.81 4.13 4.24 5.64 7.20 5.11 Bounds Res 100 4.62 15.66 19.56 Res 166 2.92 8.22 10.89 Res 233 2.18 4.87 7.18 TABLE VII: Simulation results for 3 homogeneous sensors with partial overlap as shown in Fig. 3.
  • 7. pooled resources. In terms of the computational complexity of our RH al- gorithms, the main bottleneck is the solution of the POMDP problems. The LPs solved in the column generation approach are small and are solved in minimal time. Solving the POMDPs required to generate each column (one POMDP for each visibility group in cases with partial sensor overlap) is tractable by virtue of the hierarchical breakdown of the SM problem into independent subproblems. It is also very possible to accelerate these computations using multi-core CPU or (NVIDIA) GPU processors, as the POMDPs are highly parallelizable. VI. CONCLUSIONS In this paper, we introduce RH algorithms for near- optimal, closed-loop SM with multi-modal, resource- constrained, heterogeneous sensors. These RH algorithms exploit a lower bound formulation developed in earlier work that decomposes the SM optimization into a master problem, which is addressed with linear-programming techniques, and single location stochastic control problems that are solved using POMDP algorithms. The resulting algorithm generates mixed strategies for sensor plans, and the RH algorithms convert these mixed strategies into sensor actions that satisfy sensor resource constraints. Our simulation results show that the RH algorithms achieve performance close to that of the theoretical lower bounds in [21]. The results also highlight the different benefits of choosing a longer horizon for RH strategies and alternative approaches at sampling the mixed strategy solu- tions. Our simulations also show the effects of geographically distributing sensors so that there is limited overlap in field of view and the effects of specializing sensors by using a restricted number of modes. There are many interesting directions for extensions to this work. First, one could consider the presence of object dynamics, where objects can arrive at or depart from specific locations. Second, one can also consider options for sensor motion, where individual sensors can change locations and thus observe new areas. Third, one could consider a set of objects that have deterministic, but time-varying, visibility profiles. Finally, one could consider approaches that reduce the computational complexity of the resulting algorithms, ei- ther through exploitation of parallel computing architectures or through the use of off-line learning or other approximation techniques. REFERENCES [1] B. Koopman, Search and Screening: General Principles with Histor- ical Applications. Pergamon, New York NY, 1980. [2] S. J. Benkoski, M. G. Monticino, and J. R. Weisinger, “A survey of the search theory literature,” Naval Research Logistics, vol. 38, no. 4, pp. 469–494, 1991. [3] D. A. Casta˜n´on, “Optimal search strategies in dynamic hypothesis test- ing,” Systems, Man and Cybernetics, IEEE Transactions on, vol. 25, no. 7, pp. 1130–1138, Jul 1995. [4] A. Wald, “On the efficient design of statistical investigations,” The Annals of Mathematical Statistics, vol. 14, pp. 134–140, 1943. [5] ——, “Sequential tests of statistical hypotheses,” The Annals of Mathematical Statistics, vol. 16, no. 2, pp. 117–186, 1945. [Online]. Available: http://www.jstor.org/stable/2235829 [6] D. V. Lindley, “On a measure of the information provided by an experiment,” Annals of Mathematical Statistics, vol. 27, pp. 986–1005, 1956. [7] J. C. Kiefer, “Optimum experimental designs,” Journal of the Royal Statistical Society Series B, vol. 21, pp. 272–319, 1959. [8] H. Chernovv, Sequential Analysis and Optimal Design. SIAM, Philadelphia, PA, 1972. [9] V. V. Fedorov, Theory of Optimal Experiments. Academic Press, New York, 1972. [10] K. Kastella, “Discrimination gain to optimize detection and classifica- tion,” Systems, Man and Cybernetics, Part A, IEEE Transactions on, vol. 27, no. 1, pp. 112–116, Jan. 1997. [11] C. Kreucher, K. Kastella, and I. Alfred O. Hero, “Sensor management using an active sensing approach,” Signal Processing, vol. 85, no. 3, pp. 607–624, 2005. [12] M. Athans, “On the determination of optimal costly measurement strategies for linear stochastic systems,” Automatica, vol. 8, no. 4, pp. 397–412, 1972. [Online]. Avail- able: http://www.sciencedirect.com/science/article/B6V21-47SV18C- 5/2/dc50e03b2ec82f34c592d4056c0da466 [13] V. Krishnamurthy and R. Evans, “Hidden markov model multiarm bandits: a methodology for beam scheduling in multitarget tracking,” Signal Processing, IEEE Transactions on, vol. 49, no. 12, pp. 2893– 2908, Dec 2001. [14] R. Washburn, M. Schneider, and J. Fox, “Stochastic dynamic program- ming based approaches to sensor resource management,” Information Fusion, 2002. Proceedings of the Fifth International Conference on, pp. 608–615 vol.1, 2002. [15] J. C. Gittins, “Bandit processes and dynamic allocation indices,” Journal of the Royal Statistical Society. Series B (Methodological), vol. 41, no. 2, pp. 148–177, 1979. [Online]. Available: http://www.jstor.org/stable/2985029 [16] W. Macready and I. Wolpert, D.H., “Bandit problems and the explo- ration/exploitation tradeoff,” Evolutionary Computation, IEEE Trans- actions on, vol. 2, no. 1, pp. 2–22, Apr. 1998. [17] C. Kreucher and A. O. H. III, “Monte carlo methods for sensor management in target tracking,” in IEEE Nonlinear Statistical Signal Processing Workshop, 2006. [18] E. Chong, C. Kreucher, and A. Hero, “Monte-carlo-based partially observable markov decision process approximations for adaptive sens- ing,” Discrete Event Systems, 2008. WODES 2008. 9th International Workshop on, pp. 173–180, May 2008. [19] D. A. Casta˜n´on, A. Hero, D. Cochran, and K. Kastella, Foundations and Applications of Sensor Management, 1st ed. Springer, 2008, ch. 1. [20] D. A. Casta˜n´on, “Approximate dynamic programming for sensor management,” in Proc 36th Conference on Decision and Control. IEEE, 1997, pp. 1202–1207. [21] ——, “Stochastic control bounds on sensor network performance,” Decision and Control, 2005 and 2005 European Control Conference. CDC-ECC ’05. 44th IEEE Conference on, pp. 4939–4944, Dec. 2005. [22] P. C. Gilmore and R. E. Gomory, “A linear programming approach to the cutting-stock problem,” Operations Research, vol. 9, no. 6, pp. 849–859, 1961. [Online]. Available: http://www.jstor.org/stable/167051 [23] G. B. Dantzig and P. Wolfe, “The decomposition algorithm for linear programs,” Econometrica, vol. 29, no. 4, pp. 767–778, 1961. [Online]. Available: http://www.jstor.org/stable/1911818 [24] K. A. Yost and A. R. Washburn, “The lp/pomdp marriage: Optimiza- tion with imperfect information,” Naval Research Logistics, vol. 47, no. 8, pp. 607–619, 2000. [25] J. Pineau, G. Gordon, and S. Thrun, “Point-based value iteration: An anytime algorithm for pomdps,” in International Joint Conference on Artificial Intelligence (IJCAI), Aug. 2003, pp. 1025–1032. [26] K. Kastella, “Discrimination gain for sensor management in multitar- get detection and tracking,” in IEEE-SMC and IMACS Multiconference CESA 1996, vol. 1, Jul. 1996, pp. 167–172.