Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Receding Horizon Stochastic Control Algorithms for Sensor Management ACC 2010


Published on

  • Be the first to comment

  • Be the first to like this

Receding Horizon Stochastic Control Algorithms for Sensor Management ACC 2010

  1. 1. Receding Horizon Stochastic Control Algorithms for Sensor Management Darin Hitchings and David A. Casta˜n´on Abstract— The increasing use of smart sensors that can dynamically adapt their observations has created a need for algorithms to control the information acquisition process. While such problems can usually be formulated as stochastic control problems, the resulting optimization problems are complex and difficult to solve in real-time applications. In this paper, we consider sensor management problems for sensors that are trying to find and classify objects. We propose alternative approaches for sensor management based on receding horizon control using a stochastic control approximation to the sensor management problem. This approximation can be solved using combinations of linear programming and stochastic control techniques for partially observed Markov decision problems in a hierarchical manner. We explore the performance of our proposed receding horizon algorithms in simulations using heterogeneous sensors, and show that their performance is close to that of a theoretical lower bound. Our results also suggest that a modest horizon is sufficient to achieve near- optimal performance. I. INTRODUCTION Recent advances in embedded computing have introduced a new generation of sensors that have the capability of adapting their sensing dynamically in response to collected information. For instance, unmanned aerial vehicles (UAVs) have multiple sensors —radar and electro-optical cameras —which can dynamically change their fields of view and measurement modes. These advances have created a need for a commensurate theory of sensor management (SM) and control to ensure that relevant information is collected for the mission of the sensor system given the available sensor resources. There are numerous applications involving surveillance, diagnosis and fault identification that require such control. One of the earliest examples of SM arose in the context of Search, with applications to anti-submarine warfare [1]. Sensors had the ability to move spatially and allocate their search effort over time and space. Most of the early work on search theory focused on open-loop search plans rather than feedback control of search trajectories [2]. Extensions of search theory to problems requiring adaptive feedback strategies have been developed in some restricted contexts [3]. Adaptive SM has its roots in the field of statistics, where Bayesian experiment design was used to configure subse- quent experiments based on observed information. Wald [4], [5] considered sequential hypothesis testing with costly ob- servations. Lindley [6] and Kiefer [7] expanded the concepts to include variations in potential measurements. Chernoff [8] This work was supported by a grant from AFOSR. The authors are with the Dept of Electrical & Computer Eng., Boston University,, and Fedorov [9] used Cramer-Rao bounds for selecting se- quences of measurements for nonlinear regression problems. Most of the strategies proposed for Bayesian experiment design involve single-step optimization criteria, resulting in greedy or myopic strategies that optimize bounds on the expected performance after the next experiment. Other approaches to adaptive SM using single-stage optimization have been proposed using alternative information theoretic measures [10], [11]. Feedback control approaches to SM that consider opti- mization over time have also been explored. Athans [12] considered a two-point boundary value approach to control- ling the error covariance in linear estimators by choosing the measurement matrices. Multi-armed bandit formulations have been used to control individual sensors in applications related to target tracking [13], [14]. Such approaches are restricted to single-sensor control, selecting among individual subproblems to measure, in order to obtain solutions using Gittins indices [15], [16]. Approximate dynamic program- ming (DP) techniques have also been proposed using ap- proximations to the optimal cost-to-go based on information theoretic measures evaluated using Monte Carlo techniques [17], [18]. A good overview of these techniques is available in [19]. The above approaches for dynamic feedback control are limited in application to problems with a small number of sensor-action choices and simple constraints because the algorithms must enumerate and evaluate the various control actions. In [20], combinatorial optimization techniques are integrated into a DP formulation to obtain approximate stochastic dynamic programming (SDP) algorithms that ex- tend to large numbers of sensor actions. Subsequent work in [21] derived an SDP formulation using partially ob- served Markov decision processes (POMDPs) and obtained a computable lower bound to the achievable performance of feedback strategies for complex multi-sensor management problems. The lower bound was obtained by a convex relaxation of the original combinatorial POMDP using mixed strategies and averaged constraints. However, the results in [21] do not specify algorithms with performance close to the lower bound. In this paper, we develop and implement algorithms for the efficient computation of adaptive SM strategies for complex problems involving multiple sensors with different observa- tion modes and large numbers of objects. The algorithms are based on using the lower bound formulation from [21] as an objective in a receding horizon (RH) optimization problem and developing techniques for obtaining feasible decisions from the mixed strategy solutions. The resulting
  2. 2. algorithms are scalable to large numbers of tasks, and suitable for real-time SM. We also extend the model of [21] to incorporate search actions in addition to classification. We evaluate alternative approaches for obtaining feasible decision strategies, and evaluate the resulting performance of the RH algorithms using multi-sensor simulations. Our sim- ulation results demonstrate that our RH algorithms achieve performance comparable to the predicted lower bound of [21] and shed insight into the relative value of different strategies for partitioning sensor resources either geographically or by sensor specialization. The rest of this paper is organized as follows: Section II describes the formulation of the stochastic SM problem. Section III provides an example of the column generation technique for generating mixed strategies for SM. Section IV discusses how we create feasible, sequenced sensor schedules from these mixed strategies. Section V documents our sim- ulation results for various scenarios. Section VI summarizes our results and discusses areas for future work. II. PROBLEM FORMULATION AND BACKGROUND The problem formulation is an extension of the POMDP formulation presented in [21]. Assume that there are a finite number of locations 1, . . . , N, each of which may have an object with a given type, or which may be empty. Assume that there is a set of S sensors, each of which has multiple sensor modes, and that each sensor can observe one and only one location at each time with a selected mode. Let xi ∈ {0, 1, . . . , D} denote the state of location i, where xi = 0 if location i is unoccupied, and otherwise xi = k > 0 indicates location i has an object of type k. Let πi(0) ∈ D+1 be a discrete probability distribution over the possible states for the ith location for i = 1, . . . , N where D ≥ 2. Assume additionally that the random variables xi, i = 1, . . . , N are mutually independent. There are s = 1, . . . , S sensors, each of which has m = 1, . . . , Ms possible modes of observation. We assume there is a series of T discrete decision stages where sensors can select which location to measure, where T is large enough so that all of the sensors can use their available resources. At each stage, each sensor can choose to employ one and only of its modes on a single location to collect a noisy measurement concerning the state xi at that location. Each sensor s has a limited set of locations that it can observe, denoted by Os ⊆ {1, . . . , N}. A sensor action by sensor s at stage t is a pair: us(t) = (is(t), ms(t)) (1) consisting of a location to observe, is ∈ Os, and a mode for that observation, ms. Sensor measurements are modeled as belonging to a finite set y ∈ {1, . . . , Ls}. The likelihood of the measured value is assumed to depend on the sensor s, sensor mode m, location i and on the true state at the location xi but not on the states of other locations. Denote this likelihood as P(y|xi, i, s, m). We assume that this likelihood is time- invariant, and that the random measurements yi,s,m(t) are conditionally independent of other measurements yj,σ,n(τ) given the location states xi, xj for all sensor modes m, n provided i = j or τ = t. Each sensor has a limited quantity of Ri resources avail- able for measurements. Associated with the use of mode m by sensor s on location i is a resource cost rs(us(t)) to use this mode, representing power or some other type of resource required to use this mode from this sensor. T −1 t=0 rs(us(t)) ≤ Rs ∀ s ∈ [1, . . . , S] (2) This is a hard constraint for each realization of observations and decisions. Let I(t) denote the sequence of past sensing actions and measurement outcomes up to and including stage t − 1: I(t) = {(us(k), ys(k)), s = 1, . . . , S; k = 0, . . . , t − 1} Under the assumption of conditional independence of mea- surements and independence of individual states at each lo- cation, the conditional probability of (x1, . . . , xN ) given I(t) can be factored as a product of belief states at each location. Denote the belief state at location i as πi(t) = p(xi|I(t)). When a sensor measurement is taken, the belief state is up- dated according to Bayes’ Rule. A measurement of location i with the sensor-mode combination us(t) = (i, m) at stage t that generates observable y(t) updates the belief vector as: πi(t + 1) = diag{P(y(t)|xi = j, i, s, m)}πi(t) 1T diag{P(y(t)|xi = j, i, s, m)}πi(t) (3) where 1 is the D + 1 dimensional vector of all ones. Eq. (3) captures the relevant information dynamics that SM controls. In addition to information dynamics, there are resource dynamics that characterize the available resources at stage t. The dynamics for sensor s are given as: Rs(t + 1) = Rs(t) − rs(us(t)); Rs(0) = Rs (4) These dynamics constrain the admissible decisions by a sensor, in that it can only use modes that do not use more resources than are available. Given the final information I(T), the quality of the information collected is measured by making an estimate of the state of each location i given the available information. Denote these estimates as vi i = 1, . . . , N. The Bayes’ cost of selecting estimate vi when the true state is xi is denoted as c(xi, vi) ∈ with c(xi, vi) ≥ 0. The objective of the SM, stochastic control formulation is to minimize: J = N i=1 E[c(xi, vi)] (5) by selecting adaptive sensor control policies and final esti- mates subject to the dynamics of Eq. (3) and the constraints of Eq. (4) and Eq. (2). The results in [21] provide an SDP algorithm to solve the above problem, with cost-to-go at stage t depending on the joint belief state π(t) = [π1(t), . . . , πN (t)] and the residual resource state R(t) = [R1(t), . . . , RS(t)]. Because
  3. 3. of this dependency, the cost-to-go does not decouple over locations. This leads to a very large POMDP problem with combinatorially many actions and an underlying belief state of dimension (D + 1)N that is computationally intractable unless there are very few locations. In [21], the above stochastic control problem was replaced with a simpler problem that provided a lower bound on the optimal cost, by expanding the set of admissible strategies, replacing the constraints of Eq. (2) by the “soft” constraints: E[ T −1 t=0 rs(us(t))] ≤ Rs ∀ s ∈ [1 . . . S] (6) To solve the simpler problem, [21] proposed incorporation of the soft constraints in Eq. (6) into the objective function using Lagrange multipliers λs for each sensor s. The augmented objective function is: ¯Jλ = J + T −1 t=0 S s=1 λs E[rs(us(t))] − S s=1 λsRs (7) A key result in [21] was that when the optimization of Eq. (7) was done over mixed strategies for given values of Lagrange multipliers λs, the stochastic control problem decoupled into independent POMDPs for each location, and the optimiza- tion could be performed using feedback strategies for each location i that depended only on the information collected for that location, Ii(t). These POMDPs have an underlying information state-space of dimension D + 1, corresponding to the number of possible states at a single location, and can be solved efficiently. Because the measurements and possible sensor actions are finite-valued, the set of possible SM strategies Γ is also finite. Let Q(Γ) denote the set of mixed strategies that assign probability q(γ) to the choice of strategy γ ∈ Γ. The problem of finding the optimal mixed strategies can be written as: min q∈Q(Γ) γ∈Γ q(γ) E γ J(γ) (8) γ∈Γ q(γ) E γ [ N i=1 T −1 t=0 rs(us(t))] ≤ Rs s ∈ [1, . . . , S] (9) γ∈Γ q(γ) = 1 (10) where we have one constraint for each of the S sensor re- source pools and an additional simplex constraint in Eq. (10) which ensures that q ∈ Q(Γ) forms a valid probability distribution. This is a large linear program (LP), where the number of possible variables are the strategies in Γ. However, the total number of constraints is S+1, which establishes that optimal solutions of this LP are mixtures of no more than S + 1 strategies. Thus, one can use a column generation approach [22], [23], [24] to quickly identify an optimal mixed strategy. In this approach, one solves Eq. (8) and Eq. (9) restricting the mixed strategies to be mixtures of a small subset Γ ⊂ Γ. The solution of the restricted LP has optimal dual prices λs, s = 1, . . . , S. Using these prices, one can determine a corresponding optimal pure strategy by minimizing: Jλ = N i=1 E[c(xi, vi)] + T −1 t=0 S s=1 λs E[rs(us(t))] − S s=1 λsRs (11) which the results in [21] show can be decoupled into N independent optimization problems, one for each location. Each of these problems is solved as a POMDP using standard algorithms such as point-based value iteration (PBVI) [25] to determine the best pure strategy γ1 for these prices. If the best pure strategy γ1 is already in the set Γ , then the solution of Eq. (8) and Eq. (9) restricted to Q(Γ ) is an optimal mixed strategy over all of Q(Γ). Otherwise, the strategy γ1 is added to the admissible set Γ , and the iteration is repeated. The result is a set of mixed strategies that achieve a performance level that is a lower bound on the original SM optimization problem with hard constraints. III. COLUMN GENERATION AND POMDP SUBPROBLEM EXAMPLE We present an example to illustrate the column generation algorithm and POMDP algorithms discussed previously. In this simple example we consider 100 objects (N=100), 2 pos- sible object types (D=2) with X = {non-military vehicle, military vehicle}, and 2 sensors that each have one mode (S = 2 and Ms = 1 ∀ s ∈ {1, 2}). Sensor s actions have resource costs: rs, where r1 = 1, r2 = 2. Sensors return 2 possible observation values, corresponding to binary object classifications, with likelihoods: P(yi,1,1(t)|xi, u1(t)) P(yi,2,1(t)|xi, u2(t)) 0.90 0.10 0.10 0.90 0.92 0.08 0.08 0.92 where the (j, k) matrix entry denotes the likelihood that y = j if xi = k. The second sensor has 2% better performance than the first sensor but requires twice as many resources to use. Each sensor has Rs = 100 units of resources, and can view each location. Each of the 100 locations has a uniform prior of πi = [0.5 0.5]T ∀ i. For the performance objective, we use c(xi, vi) = 1 if xi = vi, and 0 otherwise, where the cost is 1 unit for a classification error. Table I demonstrates the column generation solution pro- cess. The first three columns are initialized by guessing val- ues of resource prices and obtaining the POMDP solutions, yielding expected costs and expected resource use for each sensor at those resource prices. A small LP is solved to obtain the optimal mixture of the first three strategies γ1, . . . , γ3, and a corresponding set of dual prices. These dual prices are used in the POMDP solver to generate the fourth column γ4, which yields a strategy that is different from that of the first 3 columns. The LP is re-solved for mixtures of the first 4 strategies, yielding new resource prices that are used to generate the next column. This process continues until the solution using the prices after 7 columns yields a strategy that was already represented in a previous column, terminating
  4. 4. γ1 γ2 γ3 γ4 γ5 γ6 γ7 min 50.0 2.80 2.44 1.818 8 10 6.22 R1 0 218 200 0 0 100 150 ≤ 100 R2 0 0 36 800 200 0 18 ≤ 100 Simplex 1 1 1 1 1 1 1 = 1 Optimal cost - - 26.22 21.28 7.35 5.95 5.95 Mixture weights 0 0.424 0 0 0.500 0.076 0 λc 1 1.0e15 0.024 0.010 0.238 0.227 0.217 0.061 λc 2 1.0e15 0.025 0.015 0 0.060 0.210 0.041 TABLE I: Column generation example with 100 objects. The tableau is displayed in its final form after convergence. λc s describe the lambda trajectories up until convergence. R1 and R2 are resource constraints. γ1 is a ‘do-nothing’ strategy. Bold numbers represent useful solution data. Fig. 1: The 3 policy graphs that correspond to columns 2, 5 and 6 of Table I. The frequency of choosing each of these 3 strategies is controlled by the relative proportion of the mixture weight qc ∈ (0..1) with c ∈ {2, 5, 6}. the algorithm. The optimal mixture combines the strategies of the second, fifth and sixth columns. When the master prob- lem converges, the optimal cost, J∗ , for the mixed strategy is 5.95 units. The resulting policy graphs are illustrated in Fig. 1, where branches up indicate measurements y = 1 (‘non-military’) and down y = 2 (‘military’). The red and green nodes denote the final decision, vi, for a location. Note that the strategy of column 5 uses only the second sensor, whereas the strategies of columns 2 and 6 use only the first sensor. The mixed strategy allows the soft resource constraints to be satisfied with equality. Table I also shows the resource costs and expected classification performance of each column. The example illustrates some of the issues associated with the use of soft constraints in the optimization: the resulting solution does not lead to SM strategies that will always satisfy the hard constraints Eq. (2). We address this issue in the subsequent section. IV. RECEDING HORIZON CONTROL The column generation algorithm described previously solves the approximate SM problem with “soft” constraints in terms of mixed strategies that, on average, satisfy the resource constraints. However, for control purposes, one must select actual SM actions that satisfy the hard constraints Eq. (2). Another issue is that the solutions of the decoupled POMDPs provide individual sensor schedules for each lo- cation that must be interleaved into a single coherent sensor schedule. Furthermore, exact solution of the small decoupled POMDPs for each set of prices can be time consuming, making the resulting algorithm unsuited for real-time SM. To address this, we will explore a set of RH algorithms that will convert the mixed strategy solutions discussed in the previous section to actions that satisfy the hard constraints, and limit the computational complexity of the resulting algorithm. The RH algorithms have adjustable parameters whose effects we will explore in simulation. The RH algorithms start at stage t with an information state/resource state pair, consisting of available information about each location i = 1, . . . , N represented by the condi- tional probability vector πi(t) and available sensor resources Rs(t), s = 1, . . . , S. The first step in the algorithms is to solve the SM problem of Eq. (5) starting at stage t to final stage T subject to soft constraints Eq. (6), using the hierarchical column generation / POMDP algorithms to get a set of mixed strategies. We introduce a parameter corre- sponding to the maximum number of sensing actions per location to control the resulting computational complexity of the POMDP algorithms. The second step is to select sensing actions to implement at the current stage t from the mixed strategies. These strategies are mixtures of at most S + 1 pure strategies, with associated probabilistic weights. We explore three ap- proaches for selecting sensing actions: • str1: Select the pure strategy with maximum probability. • str2: Randomly select a pure strategy per location according to the optimal mixture probabilities. • str3: Select the pure strategy with positive probability that minimizes the expected sensor resource use (and thus leaves resources for use in future stages.) Once pure strategies for each location have been selected, the third step is to select a sensing action to be implemented for each location. Our approach is to select the first sensing action of the pure strategy for each location. Note that there may not be enough sensor resources to execute the selected actions, particularly in the case where the pure strategy with maximum probability is selected. To address this, we rank sensing actions by their expected entropy gain [26], which is the expected reduction in entropy of the conditional probability distribution, πi(t), based on the anticipated measurement value. We schedule sensor actions in order of decreasing expected entropy gain, and perform those actions at stage t that have enough sensor resources to be feasible. The measurements collected from the scheduled actions are used to update the information states πi(t + 1) using Eq. (3). The resources used by the actions are eliminated from the available resources to compute Rs(t + 1) using Eq. (4). The RH algorithm is then executed from the new information state/resource state condition.
  5. 5. Search Low-res Hi-res y1 y2 y3 y1 y2 y3 y1 y2 y3 empty 0.92 0.04 0.04 0.95 0.03 0.02 0.95 0.03 0.02 car 0.08 0.46 0.46 0.05 0.85 0.10 0.02 0.95 0.03 truck 0.08 0.46 0.46 0.05 0.10 0.85 0.02 0.90 0.08 military 0.08 0.46 0.46 0.05 0.10 0.85 0.02 0.03 0.95 TABLE II: Observation likelihoods for different sensor modes with the observation symbols y1, y2 and y3. Fig. 2: Illustration of scenario with two partially-overlapping sensors. V. SIMULATION RESULTS In order to evaluate the relative performance of the differ- ent RH algorithms, we performed a set of experiments with simulations. In these experiments, there were 100 locations, each of which could be empty, or have objects of three types, so the possible states of location i were xi ∈ {0, 1, 2, 3} where type 1 represents cars, type 2 trucks, and type 3 military vehicles. Sensors can have several modes: a search mode, a low resolution mode and a high resolution mode. The search mode primarily detects the presence of objects; the low resolution mode can identify cars, but confuses the other two types, whereas the high resolution mode can separate the three types. Observations are modeled as having three possible values. The search mode consumes 0.25 units of resources, whereas the low-resolution mode consumes 1 unit and the high resolution mode 5 units, uniformly for each sensor and location. Table II shows the likelihood functions that were used in the simulations. Initially, each location has a state with one of two prior probability distributions: πi(0) = [0.10 0.60 0.20 0.10]T , i ∈ [1, . . . , 10] or πi(0) = [0.80 0.12 0.06 0.02]T , i ∈ [11, . . . , 100]. Thus, the first 10 locations are likely to contain objects, whereas the other 90 locations are likely to be empty. When multiple sensors are present, they may share some locations in common, and have locations that can only be seen by a specific sensor, as illustrated in Fig. 2. The cost function used in the experiments, c(xi, vi) is shown in Table III. The parameter MD represents the cost of a missed detection, and will be varied in the experiments. Table IV shows simulation results for a search and clas- sify scenario involving 2 identical sensors (with the same xi; vi empty car truck military empty 0 1 1 1 car 1 0 0 1 truck 1 0 0 1 military MD MD MD 0 TABLE III: Decision costs MD = 1 MD = 5 MD = 10 Hor. 3 str1 str2 str3 str1 str2 str3 str1 str2 str3 Res 30 3.64 3.85 3.85 11.82 12.88 12.23 15.28 14.57 14.50 Res 50 2.40 2.80 2.43 6.97 6.93 7.84 10.98 9.99 10.45 Res 70 2.45 2.32 1.88 3.44 3.99 4.04 6.14 6.48 5.10 Hor. 4 Res 30 3.58 3.46 3.52 12.28 12.62 11.90 14.48 15.91 15.59 Res 50 2.37 2.21 2.33 7.44 7.44 7.20 9.94 9.28 10.65 Res 70 1.68 1.33 1.60 3.59 3.57 3.62 6.30 5.18 5.86 Hor. 6 Res 30 3.51 3.44 3.73 11.17 11.85 12.09 15.17 14.99 13.6 Res 50 2.28 2.11 2.31 7.29 8.02 7.70 10.67 10.47 11.25 Res 70 1.43 1.38 1.44 3.60 3.73 3.84 4.91 5.09 5.94 Bounds Res 30 3.35 11.50 13.85 Res 50 2.21 6.27 9.40 Res 70 1.32 2.95 4.96 TABLE IV: Simulation results for 2 homogeneous, multi-modal sensors. str1: select the most likely pure strategy for all locations; str2: randomize the choice of strategy per location according to mixture probabilities; str3: select the strategy that yields the least expected use of resources for all locations. visibility), evaluating different versions of the RH control algorithms and with different resource levels. The variable “Horizon” is the total number of sensor actions allowed per location plus one additional action for estimating the location content. The table shows results for different resource levels per sensor, from 30 to 70 units, and displays the lower bound performance computed. The missed detection cost MD is varied from 1 to 10. The results shown in Table IV represent the average of 100 Monte Carlo simulation runs of the RH algorithms. The results show that using a longer horizon in the planning improves performance minimally, so that using a RH replanning approach with a short horizon can be used to reduce computation time with limited performance degradation. The results also show that the different RH algorithms have performance close to the optimal lower bound in most cases, with the exception being the case of MD = 5 with 70 units of sensing resources per sensor. For a horizon 6 plan, the longest horizon studied, the simulation performance is close to that of the associated bound. In terms of which strategy is preferable for converting the mixed strategies, the results of Table IV are unclear. For short planning horizons in the RH algorithms, the preferred strategy appears to be to use the least resources (str3), thus allowing for improvement from replanning. For the longer horizons, there was no significant difference in performance among the three strategies. To illustrate the computational requirements of this scenario (4 states, 3 observations, 2 sen- sors (6 actions), full sensor-overlap), the number of columns generated by the column generation algorithm to compute a set of mixed strategies was on the order of 10-20 columns for the horizon 6 algorithms, which takes about 60 sec on a 2.2 Ghz, single-core, Intel P4 machine under linux using C code in ‘Debug’ mode (with 1000 belief-points for PBVI). Memory usage without optimizations is around 3 MB. There are typically 4-5 planning sessions in a simulation. Profiling indicates that roughly 80% of the computing time
  6. 6. Homogeneous Heterogeneous MD MD Horizon 3 1 5 10 1 5 10 Res[150, 150] 5.69 16.93 30.38 6.34 18.15 31.23 Res[250, 250] 4.61 16.11 25.92 5.53 16.77 29.32 Res[350, 350] 4.23 15.30 21.45 5.12 16.41 27.41 Horizon 4 Res[150, 150] 5.02 16.06 20.61 5.64 16.85 20.61 Res[250, 250] 3.94 9.46 12.66 4.58 12.05 14.87 Res[350, 350] 3.35 8.58 12.47 4.28 9.41 12.65 Horizon 6 Res[150, 150] 4.62 15.66 19.56 5.27 16.20 19.56 Res[250, 250] 2.92 8.24 10.91 3.32 8.83 11.35 Res[350, 350] 2.18 4.86 7.15 2.66 6.63 9.17 TABLE V: Comparison of lower-bounds for 2 homogeneous, bi- modal sensors vs. 2 heterogeneous sensors. goes towards value backups in the PBVI routine and 15% goes towards (recursively) tracing decision-trees in order to back out (deduce) the measurement costs from hyperplane costs. (Every node in a decision-tree / policy-graph (for each pure strategy) has a corresponding hyperplane with a vector of cost coefficients that represent classification + measurement costs). In the next set of experiments, we compare the use of heterogeneous sensors that have different modes available. In these experiments, the 100 locations are guaranteed to have an object, so xi = 0 is not feasible. The prior probability of object type for each location is πi(0) = [0 0.7 0.2 0.1]T . Table V shows the results of experiments with sensors that have all sensing modes, versus an experiment where one sensor has only a low-resolution mode and the other sensor has both high and low-resolution modes. The table shows the lower bounds predicted by the column generation algorithm, to illustrate the change in performance expected from the different architectural choices of sensors. The results indicate that specialization of one sensor can lead to significant degradation in performance due to inefficient use of its resources. The next set of results explore the effect of spatial distribu- tion of sensors. We consider experiments where there are two homogeneous sensors which have only partially-overlapping coverage zones. (We define a ‘visibility group’ as a set of sensors that have a common coverage zone). Table VI gives bounds for different percentages of overlap. Note that, even when there is only 20% overlap, the achievable performance is similar to that of the 100% overlap case in Table V, indicating that proper choice of strategies can lead to efficient sharing of resources from different sensors and equalizing their workload. The last set of results show the performance of the RH algorithms for three homogeneous sensors with partial overlap and different resource levels. The visibility groups are graphically portrayed in Fig. 3. Table VII presents the simulated cost values averaged over 100 simulations of the different RH algorithms and the lower bounds. The results support our previous conclusions: when a short horizon is used in the RH algorithm, and there are sufficient resources, the strategy that uses the least resources is preferred as it allows for replanning when new information Overlap 60% Overlap 20% MD MD Horizon 3 1 5 10 1 5 10 Res[150, 150] 5.69 16.93 30.38 5.69 16.93 30.38 Res[150, 150] 4.61 16.11 25.98 4.61 16.11 25.92 Res[150, 150] 4.23 15.30 21.45 4.23 15.30 21.45 Horizon 4 Res[150, 150] 5.02 16.06 20.61 5.02 15.93 20.61 Res[150, 150] 3.94 9.46 12.66 3.94 9.46 12.66 Res[150, 150] 3.35 8.58 12.47 3.35 8.58 12.47 Horizon 6 Res[150, 150] 4.62 15.66 19.56 4.62 15.66 19.56 Res[150, 150] 2.92 8.25 10.91 2.94 8.24 10.91 Res[150, 150] 2.18 4.86 7.19 2.18 4.86 7.16 TABLE VI: Comparison of performance bounds with 2 homo- geneous sensors with partial overlap in coverage. Only the bold numbers are different. Fig. 3: The 7 visibility groups for the 3 sensor experiment indicating the number of locations in each group. is available. If the RH algorithm uses a longer horizon, then its performance approaches the theoretical lower bound, and the difference in performance between the three approaches for sampling the mixed strategy to obtain a pure strategy is statistically insignificant. Our results suggest that RH control with modest horizons of 2 or 3 sensor actions per location can yield performance close to the best achievable performance using mixed strategies. If shorter horizons are used to reduce computation, then an approach that samples mixed strategies by using the smallest amount of resources is preferred. The results also show that, with proper SM, geographically distributed sensors with limited visibility can be coordinated to achieve equivalent performance to centrally MD = 1 MD = 5 MD = 10 Horizon 3 str1 str2 str3 str1 str2 str3 str1 str2 str3 Res 100 5.26 6.08 5.57 17.23 17.44 16.79 22.02 21.93 22.16 Res 166 5.91 4.81 3.13 10.23 11.91 9.21 14.19 16.66 12.85 Res 233 3.30 3.75 3.43 10.15 9.32 5.88 14.49 12.55 8.21 Horizon 4 Res 100 5.32 5.58 5.93 17.26 16.88 16.17 21.92 20.94 21.35 Res 166 3.42 4.07 3.24 8.63 8.00 9.04 12.05 11.71 14.08 Res 233 3.65 3.07 3.29 5.27 7.14 5.38 8.25 10.08 7.90 Horizon 6 Res 100 5.79 5.51 5.98 17.13 17.90 17.44 22.03 20.56 22.17 Res 166 2.96 2.68 2.72 10.22 8.33 9.08 9.82 11.47 11.57 Res 233 1.52 2.00 1.70 4.81 4.13 4.24 5.64 7.20 5.11 Bounds Res 100 4.62 15.66 19.56 Res 166 2.92 8.22 10.89 Res 233 2.18 4.87 7.18 TABLE VII: Simulation results for 3 homogeneous sensors with partial overlap as shown in Fig. 3.
  7. 7. pooled resources. In terms of the computational complexity of our RH al- gorithms, the main bottleneck is the solution of the POMDP problems. The LPs solved in the column generation approach are small and are solved in minimal time. Solving the POMDPs required to generate each column (one POMDP for each visibility group in cases with partial sensor overlap) is tractable by virtue of the hierarchical breakdown of the SM problem into independent subproblems. It is also very possible to accelerate these computations using multi-core CPU or (NVIDIA) GPU processors, as the POMDPs are highly parallelizable. VI. CONCLUSIONS In this paper, we introduce RH algorithms for near- optimal, closed-loop SM with multi-modal, resource- constrained, heterogeneous sensors. These RH algorithms exploit a lower bound formulation developed in earlier work that decomposes the SM optimization into a master problem, which is addressed with linear-programming techniques, and single location stochastic control problems that are solved using POMDP algorithms. The resulting algorithm generates mixed strategies for sensor plans, and the RH algorithms convert these mixed strategies into sensor actions that satisfy sensor resource constraints. Our simulation results show that the RH algorithms achieve performance close to that of the theoretical lower bounds in [21]. The results also highlight the different benefits of choosing a longer horizon for RH strategies and alternative approaches at sampling the mixed strategy solu- tions. Our simulations also show the effects of geographically distributing sensors so that there is limited overlap in field of view and the effects of specializing sensors by using a restricted number of modes. There are many interesting directions for extensions to this work. First, one could consider the presence of object dynamics, where objects can arrive at or depart from specific locations. Second, one can also consider options for sensor motion, where individual sensors can change locations and thus observe new areas. Third, one could consider a set of objects that have deterministic, but time-varying, visibility profiles. Finally, one could consider approaches that reduce the computational complexity of the resulting algorithms, ei- ther through exploitation of parallel computing architectures or through the use of off-line learning or other approximation techniques. REFERENCES [1] B. Koopman, Search and Screening: General Principles with Histor- ical Applications. Pergamon, New York NY, 1980. [2] S. J. Benkoski, M. G. Monticino, and J. R. Weisinger, “A survey of the search theory literature,” Naval Research Logistics, vol. 38, no. 4, pp. 469–494, 1991. [3] D. A. Casta˜n´on, “Optimal search strategies in dynamic hypothesis test- ing,” Systems, Man and Cybernetics, IEEE Transactions on, vol. 25, no. 7, pp. 1130–1138, Jul 1995. [4] A. Wald, “On the efficient design of statistical investigations,” The Annals of Mathematical Statistics, vol. 14, pp. 134–140, 1943. [5] ——, “Sequential tests of statistical hypotheses,” The Annals of Mathematical Statistics, vol. 16, no. 2, pp. 117–186, 1945. [Online]. Available: [6] D. V. Lindley, “On a measure of the information provided by an experiment,” Annals of Mathematical Statistics, vol. 27, pp. 986–1005, 1956. [7] J. C. Kiefer, “Optimum experimental designs,” Journal of the Royal Statistical Society Series B, vol. 21, pp. 272–319, 1959. [8] H. Chernovv, Sequential Analysis and Optimal Design. SIAM, Philadelphia, PA, 1972. [9] V. V. Fedorov, Theory of Optimal Experiments. Academic Press, New York, 1972. [10] K. Kastella, “Discrimination gain to optimize detection and classifica- tion,” Systems, Man and Cybernetics, Part A, IEEE Transactions on, vol. 27, no. 1, pp. 112–116, Jan. 1997. [11] C. Kreucher, K. Kastella, and I. Alfred O. Hero, “Sensor management using an active sensing approach,” Signal Processing, vol. 85, no. 3, pp. 607–624, 2005. [12] M. Athans, “On the determination of optimal costly measurement strategies for linear stochastic systems,” Automatica, vol. 8, no. 4, pp. 397–412, 1972. [Online]. Avail- able: 5/2/dc50e03b2ec82f34c592d4056c0da466 [13] V. Krishnamurthy and R. Evans, “Hidden markov model multiarm bandits: a methodology for beam scheduling in multitarget tracking,” Signal Processing, IEEE Transactions on, vol. 49, no. 12, pp. 2893– 2908, Dec 2001. [14] R. Washburn, M. Schneider, and J. Fox, “Stochastic dynamic program- ming based approaches to sensor resource management,” Information Fusion, 2002. Proceedings of the Fifth International Conference on, pp. 608–615 vol.1, 2002. [15] J. C. Gittins, “Bandit processes and dynamic allocation indices,” Journal of the Royal Statistical Society. Series B (Methodological), vol. 41, no. 2, pp. 148–177, 1979. [Online]. Available: [16] W. Macready and I. Wolpert, D.H., “Bandit problems and the explo- ration/exploitation tradeoff,” Evolutionary Computation, IEEE Trans- actions on, vol. 2, no. 1, pp. 2–22, Apr. 1998. [17] C. Kreucher and A. O. H. III, “Monte carlo methods for sensor management in target tracking,” in IEEE Nonlinear Statistical Signal Processing Workshop, 2006. [18] E. Chong, C. Kreucher, and A. Hero, “Monte-carlo-based partially observable markov decision process approximations for adaptive sens- ing,” Discrete Event Systems, 2008. WODES 2008. 9th International Workshop on, pp. 173–180, May 2008. [19] D. A. Casta˜n´on, A. Hero, D. Cochran, and K. Kastella, Foundations and Applications of Sensor Management, 1st ed. Springer, 2008, ch. 1. [20] D. A. Casta˜n´on, “Approximate dynamic programming for sensor management,” in Proc 36th Conference on Decision and Control. IEEE, 1997, pp. 1202–1207. [21] ——, “Stochastic control bounds on sensor network performance,” Decision and Control, 2005 and 2005 European Control Conference. CDC-ECC ’05. 44th IEEE Conference on, pp. 4939–4944, Dec. 2005. [22] P. C. Gilmore and R. E. Gomory, “A linear programming approach to the cutting-stock problem,” Operations Research, vol. 9, no. 6, pp. 849–859, 1961. [Online]. Available: [23] G. B. Dantzig and P. Wolfe, “The decomposition algorithm for linear programs,” Econometrica, vol. 29, no. 4, pp. 767–778, 1961. [Online]. Available: [24] K. A. Yost and A. R. Washburn, “The lp/pomdp marriage: Optimiza- tion with imperfect information,” Naval Research Logistics, vol. 47, no. 8, pp. 607–619, 2000. [25] J. Pineau, G. Gordon, and S. Thrun, “Point-based value iteration: An anytime algorithm for pomdps,” in International Joint Conference on Artificial Intelligence (IJCAI), Aug. 2003, pp. 1025–1032. [26] K. Kastella, “Discrimination gain for sensor management in multitar- get detection and tracking,” in IEEE-SMC and IMACS Multiconference CESA 1996, vol. 1, Jul. 1996, pp. 167–172.