244 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 1, FEBRUARY 2012solutions exploits the receiver diversity gain in the context of among the termination events as follows. We assume that uponopportunistic routing. However, for the sake of completeness, the termination of a packet at the destination (successful de-we provide a brief overview of the existing approaches. The livery of a packet to the destination), a ﬁxed and given positiveauthors in – focus on heuristic routing algorithms that delivery reward is obtained, while no reward is obtained ifadaptively identify the least congested path in a wired network. the packet is terminated before it reaches the destination. LetIf the network congestion, hence delay, were to be replaced by denote this random reward obtained at the termination time ,time-invariant quantities,1 the heuristics in – would be- i.e., either zero if the packet is dropped prior to reaching the des-come a special case of d-AdaptOR in a network with deter- tination node or if the packet is received at the destination.ministic channels and with no receiver diversity. In this light, Let denote the index of the node which at time trans-Theorem 1 in Section IV provides analytic guarantees for the mits packet , and accordingly let denote the cost of trans-heuristics obtained in –. In , analytic results for ant mission (equal to zero if at time packet is not transmitted).routing are obtained in wired networks without opportunism. The routing scheme can be viewed as selecting a (random) se-Ant routing uses ant-like probes to ﬁnd paths of optimal costs quence of nodes for relaying packets .3 Assuch as expected hop count, expected delay, and packet loss such, the expected average per-packet reward associated withprobability.2 This dependence on ant-like probing represents routing packets along a sequence of up to time isa stark difference with our approach where d-AdaptOR reliessolely on data packet for exploration. The rest of the paper is organized as follows. In Section II, we (1)discuss the system model and formulate the problem. Section IIIformally introduces our proposed adaptive routing algorithm,d-AdaptOR. We then state and prove the optimality theorem for where denotes the number of packets terminated up tod-AdaptOR in Section IV. In Section V, we present the imple- time and the expectation is taken over the events of transmis-mentation details and practical issues of d-AdaptOR. We per- sion decisions, successful packet receptions, and packet gener-form simulation study of d-AdaptOR in Section VI. Finally, we ation times.conclude the paper and discuss future work in Section VII. Problem : Choose a sequence of relay nodes in the absence of knowledge about the network topology such that is maximized as . II. SYSTEM MODEL In Section III, we propose the d-AdaptOR algorithm, which http://ieeexploreprojects.blogspot.com We consider the problem of routing packets from a source solves Problem . The nature of the algorithm allows nodes tonode 0 to a destination node in a wireless ad hoc network of make routing decisions in distributed, asynchronous, and adap- nodes denoted by the set . The time is tive manner.slotted and indexed by (this assumption is not technically Remark 1: The problem of opportunistic routing for multiplecritical and is only assumed for ease of exposition). A packet source–destination pairs, without loss of generality, can be de-indexed by is generated at the source node 0 at time composed to the single source–destination problem describedaccording to an arbitrary distribution with rate . above [Problem is solved for each distinct ﬂow]. We assume a ﬁxed transmission cost is incurred upon III. DISTRIBUTED ALGORITHMa transmission from node . Transmission cost can be consid-ered to model the amount of energy used for transmission, the Before we proceed with the description of d-AdaptOR, weexpected time to transmit a given packet, or the hop count when provide the following notations. Let denote the set ofthe cost is set to unity. We consider an opportunistic routing set- neighbors of node including node itself. Let denote theting with no duplicate copies of the packets. In other words, at set of potential reception outcomes due to a transmission froma given time only one node is responsible for routing any given node , i.e., . We refer topacket. Given a successful packet transmission from node to as the state space for node ’s transmission. Furthermore, letthe set of neighbor nodes , the next (possibly randomized) . Let denote the space of all al-routing decision includes: 1) retransmission by node ; 2) re- lowable actions available to node upon successful reception atlaying the packet by a node ; or 3) dropping the packet nodes in . Finally, for each node , we deﬁne a reward functionaltogether. If node is selected as a relay, then it transmits the on states and potential decisions aspacket at the next slot, while other nodes , expungethat packet. if We deﬁne the termination event for packet to be the event if andthat packet is either received at the destination or is dropped if butby a relay before reaching the destination. We denote this ter-mination action by . We deﬁne termination time to be the A. Overview of d-AdaptORstopping time when packet is terminated. We discriminate As discussed before, the routing decision at any given time 1The delay and congestion are highly time-varying quantities. is made based on the reception outcome and involves retrans- 2Here, we note that unlike congestion or instantaneous delay, the expected mission, choosing the next relay, or termination. Our proposeddelay under a stable and stationary routing algorithm is indeed time-invariant,and allow for similar mathematically sound treatment. 3Packets are indexed according to the termination order.
BHORKAR et al.: ADAPTIVE OPPORTUNISTIC ROUTING FOR WIRELESS AD HOC NETWORKS 245 TABLE I NOTATIONS USED IN THE DESCRIPTION OF THE ALGORITHM Fig. 1. Flow of the algorithm. The algorithm follows a four-stage procedure: transmission, acknowledgment, relay, and update. (small) time after the start of th slot and to de- note some (small) time before the end of th slot such thatscheme makes such decisions in a distributed manner via the fol- .lowing three-way handshake between node and its neighbors 0) Initialization: . For all , initialize 1) At time , node transmits a packet. , while 2) The set of nodes who have successfully received the . packet from node , transmit acknowledgment (ACK) 1) Transmission Stage: packets to node . In addition to the node’s identity, the Transmission stage occurs at time in which node trans- acknowledgment packet of node http://ieeexploreprojects.blogspot.com includes a control mits if it has a packet. message known as estimated best score (EBS) and denoted 2) Reception and acknowledgment Stage: by . Let denote the (random) set of nodes that have re- 3) Node announces node as the next transmitter or ceived the packet transmitted by node . In the reception announces the termination decision in a forwarding (FO) and acknowledgment stage, successful reception of the packet. packet transmitted by node is acknowledged to it by The routing decision of node at time is based on an adap- all the nodes in . We assume that the delay for thetive (stored) score vector . The score vector acknowledgment stage is small enough (not more than thelies in space , where , and is updated duration of the time slot) such that node infers byby node using the EBS messages obtained from neigh- time .bors . Furthermore, node uses a set of counting vari- For all nodes , the ACK packet of node to nodeables and and a sequence of positive scalars includes the EBS message . to update its score vector at time . The counting vari- Upon reception and acknowledgment, the countingable is equal to the number of times neighbors random variable is incremented as follows:have received (and acknowledged) the packets transmitted fromnode and routing decision has been made up totime . Similarly, is equal to the number of times the ifset of nodes has received (and acknowledged) packets trans- ifmitted from node up to time . Lastly, is a ﬁxedsequence of numbers available at all nodes. 3) Relay Stage: Table I provides notations used in the description of the algo- Node selects a routing action accordingrithm, while Fig. 1 gives an overview of the components of the to the following (randomized) rule parameterized byalgorithm. Next, we present further details. . • With probabilityB. Detailed Description of d-AdaptOR The operation of d-AdaptOR can be described in terms ofinitialization and four stages of transmission, reception and ac-knowledgment, relay, and adaptive computation as shown in is selected.4Fig. 1. For simplicity of presentation, we assume a sequen-tial timing for each of the stages. We use to denote some 4In case of ambiguity, node with the smallest index is chosen.
246 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 1, FEBRUARY 2012 • With probability to the genie-aided or greedy-based schemes such as ExOR or SR. IV. ANALYTIC OPTIMALITY OF D-ADAPTOR is selected uniformly with probability . We will now state the main result establishing the optimality Node transmits FO, a control packet that contains infor- of the proposed d-AdaptOR algorithm under the assumptions of mation about routing decision at some time strictly be- a time-invariant model of packet reception and reliable control tween and . If , then node pre- packets. More precisely, we have the following assumptions. pares for forwarding in the next time slot, while nodes Assumption 1: The probability of successful reception of a expunge the packet. If termination ac- packet transmitted by node at set of nodes is , tion is chosen, i.e., , all nodes in expunge the independent of time and all other routing decisions. packet. The probabilities in Assumption 1 characterize a Upon selection of routing action, the counting variable packet reception model that we refer to as local broadcast is updated model. Note that for all , successful reception at and are mutually exclusive and . Fur- if thermore, logically node is always a recipient of its own if transmission, i.e., iff . 4) Adaptive Computation Stage: Assumption 2: The successful reception at set due to trans- At time , after being done with transmission and mission from node is acknowledged perfectly to node . relaying, node updates score vector as follows. Remark 2: Assumption 1 is in line with the experimen- • For tally tested state of the art routing protocols MORE  and ExOR . These studies seem to indicate that reasonably simple probabilistic models provide good abstractions of media (2) access control (MAC) and physical (PHY) layers at the routing layer. • Otherwise Remark 3: In practice, Assumption 2 is hard to satisfy. But as we will see in Section VI, when the rates and power of the http://ieeexploreprojects.blogspot.com (3) control packets are set to maximize the reliability, the impact of violating this assumption can be kept extremely low. Furthermore, node updates its EBS message for Remark 4: In Section VI, we address the severity as well as future acknowledgments as the implications of Assumptions 1 and 2. In particular, via a set of QualNet simulations, we will show that d-AdaptOR exhibits many of its desirable properties in a realistic setup despite the relaxation of the analytical assumptions. Given Assumptions 1 and 2, we are almost ready toC. Computational Issues present Theorem 1 regarding the optimality of d-AdaptOR The computational complexity and control overhead of among the class of policies that are oblivious to the net-d-AdaptOR is low. work topology and/or channel statistics. More precisely, let 1) Complexity: To execute stochastic recursion (2), the a distributed routing policy be a collectionnumber of computations required per packet is order of of routing decisions taken at nodes , where de- at each time slot. The space complexity notes a sequence of random actions forof d-AdaptOR is exponential in the number of neighbors, i.e., node . The policy is said to be (P)-admissible if for all for each node. The reduction in storage nodes , the event belongsrequirement using approximation techniques in  is left as to the -ﬁeld generated by the observations at node ,future work. i.e., . Let denote the 2) Control Overhead: The number of acknowledgments per set of such -admissible policies. Theorem 1 states thatpacket is order of , independent of network d-AdaptOR, denoted by , is an optimal -admissiblesize. policy. 3) Exploration Overhead: The adaptation to the optimal per- Theorem 1: Suppose andformance in the network is guaranteed via a controlled random- Assumptions 1 and 2 hold. Then, for allized routing strategy that can be viewed as cost of exploration.The cost of exploration is proportional to the total number ofpackets whose routes deviates from the optimal path. In proof ofTheorem 1, we show that this cost increases sublinearly with thenumber of delivered packets, hence the per-packet explorationcost diminishes as the number of delivered packets grows. Addi-tionally, communication of adds a very modest overhead
BHORKAR et al.: ADAPTIVE OPPORTUNISTIC ROUTING FOR WIRELESS AD HOC NETWORKS 247where and are the expectations taken with respect to centralized policies. This Auxiliary Problem ( ) has been ex-policies and , respectively.5 tensively studied in , , and , where a Markov decision Next, we prove the optimality of d-AdaptOR in two steps. formulation provides the following important result.In the ﬁrst step, we show that converges in an almost sure Fact 1 [6, Theorem 2.1]: Consider the unique solutionsense. In the second step, we use this convergence result to show to the following ﬁxed-point equation:that d-AdaptOR is optimal for Problem . (6)A. Convergence of (7) Let be an operator on vector suchthat There exists an optimal topology-aware and centralized admis- sible policy such that (4) (8) Let denote the ﬁxed point of operator ,6 i.e., Lemma 2 states the relationship between the solution of Problem ( ) and that of the Auxiliary Problem ( ). More speciﬁcally, Lemma 2 shows that is an upper bound for (5) the solution to Problem ( ). Lemma 2: For any (P)-admissible policy for The following lemma establishes the convergence of recur- Problem ( ) and for allsion (2) to the ﬁxed point of . Lemma 1: Let: J1) for all ; J2) .Then, the sequence obtained by the stochastic recursion (2) http://ieeexploreprojects.blogspot.comAppendix-B. Intuitively, the result holds The proof is given inconverges to almost surely. The proof uses known results on the convergence of a because the set of (P)-admissible policies is a subset of (AP)-certain recursive stochastic process as presented by Fact 2 in admissible policies, i.e., .Appendix-A. Lemma 3 gives the achievability proof by showing that the expected average per-packet reward of d-AdaptOR is lower-B. Proof of Optimality bounded by . Lemma 3: For any Using the convergence of , we show that the expected av-erage per-packet reward under d-AdaptOR is equal to the op-timal expected average per-packet reward obtained for a genie-aided system where the local broadcast model is known per-fectly. In other words, we take cue from known results asso-ciated with a closely related Auxiliary Problem ( ). In this The proof is given in Appendix-C. Lemmas 2 and 3 implyAuxiliary Problem ( ), there exists a centralized controller that [which is (P)-admissible by construction] is an optimalwith full knowledge of the local broadcast model as well policy under whichas the transmission outcomes across the network , . Theobjective in the Auxiliary Problem ( ) is a single-packet vari-ation of that in Problem ( ): the reward exists and is equal to establishing the proof of Theorem 1. Corollary 1: When , the network is connected, and is greater than the worst-case routing cost,7 d-AdaptORfor routing a single packet from the source to the destination minimizesis maximized over a set of (AP)-admissible policies, wherethis set of (AP)-admissible policies is a superset of (P)-ad- (9)missible policies that also includes all topology-aware and 5This is a strong notion of optimality and implies that the proposed algo- the expected per-packet delivery time as .rithm’s expected average reward is greater than the best-case performance of all policies [18, p. 344]. 7The worst-case routing cost can be determined by taking supremum over 6Existence and uniqueness of is provided in Appendix-A. ETX metrics for all source–destination pairs.
248 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 1, FEBRUARY 2012 This is because when is sufﬁciently large, and thenetwork is connected V. PROTOCOL DESIGN AND IMPLEMENTATION ISSUES In this section, we describe an 802.11 compatible implemen-tation for d-AdaptOR.A. 802.11 Compatible Implementation The implementation of d-AdaptOR, analogous to any oppor-tunistic routing scheme, involves the selection of a relay node Fig. 2. Frame structure of the data packets, acknowledgment packets, and FO packets.among the candidate set of nodes that have received and ac-knowledged a packet successfully. One of the major challengesin the implementation of an opportunistic routing algorithm in the candidate set in priority order, the payload, and the 802.11general, and the d-AdaptOR algorithm in particular, is the de- Frame Check Sequence. The acknowledgment frame includessign of an 802.11 compatible acknowledgment mechanism at the data frame sender’s address and the feedback EBS .the MAC layer. We propose a practical and simple way to im- The FO packet is exactly the same as a standard 802.11 shortplement acknowledgment architecture. control frame that uses different subtype value. The transmission at any node is done according to an 802.11CSMA/CA mechanism. Specially, before any transmission,transmitter performs channel sensing and starts transmission B. d-AdaptOR in a Realistic Settingafter the backoff counter is decremented to zero. For each http://ieeexploreprojects.blogspot.comneighbor node , the transmitter node then reserves 1) Loss of ACK and FO Packets: Interference or lowa virtual time slot of duration , where is signal-to-noise ratio (SNR) can cause loss of ACK and FOthe duration of the acknowledgment packet and is the packets. Loss of an ACK packet results in an incorrect estima-duration of Short InterFrame Space (SIFS) . Transmitter tion of nodes that have received the packet, and thus affects thethen piggybacks a priority ordering of nodes with each performance of the algorithm. Loss of FO packet negativelydata packet transmitted. The priority ordering determines the impacts the throughput performance of the network. In partic-virtual time slot in which the candidate nodes transmit their ular, loss of an FO packet can result in the drop of data packetsacknowledgment. Nodes in the set that have successfully at all the potential relays, reducing the throughput performance.received the packet then transmit acknowledgment packets Hence, in our design, FO packets are transmitted at lower ratessequentially in the order determined by the transmitter node. to ensure a reliable transmission. After a waiting time of during 2) Increased Overhead: As it is the case with any oppor-which each node in the set has had a chance to send an ACK, tunistic scheme, d-AdaptOR adds a modest additional overheadnode transmits a FOrwarding control packet (FO). The FO to the standard 802.11 due to the added acknowledgment/hand-packets contain the identity of the next forwarder, which may shake structure. This overhead increases linearly with thebe node again or any node . If expires and no number of neighbors. Assuming a 802.11b physical layer oper-FO packet is received (FO packet reception is unsuccessful), ating at 11 Mb/s with an SIFS time of 10 s, preamble durationthen the corresponding candidate nodes drop the received data of 20 s, Physical Layer Convergence Protocol (PLCP) headerpacket. If the transmitter does not receive any acknowledg- duration of 4 s, and 512-B frame payloads, Table II comparesment, node retransmits the packet. The backoff window is the overhead in the data packet due to piggybacking and thedoubled after every retransmission. Furthermore, the packet is control overhead due to ACK and FO packets for unicastdropped if the retry limit (set to 7) is reached. 802.11, genie-aided opportunistic scheme, and d-AdaptOR. In addition to the acknowledgment scheme, d-AdaptOR d-AdaptOR requires communication overhead of 4 extra bytesrequires modiﬁcations to the 802.11 MAC frame format. (for EBS) per ACK packet compared to the genie-aided op-Fig. 2 shows the modiﬁed MAC frame formats required by portunistic scheme, while unicast 802.11 does not require suchd-AdaptOR. The reserved bits in the type/subtype ﬁelds of the overhead.frame control ﬁeld of the 802.11 MAC speciﬁcation are used Note that the overhead cost can be reduced by restrictingto indicate whether the rest of the frame is a d-AdaptOR data the number of nodes in the candidate list of MAC header toframe, a d-AdaptOR ACK, or a, FO.8 The data frame contains a given number, MAX-NEIGHBOUR. The unique ordering 8This enables the d-AdaptOR to communicate and be fully compatible with for the nodes in the candidate set is determined by prioritizingother 802.11 devices. the nodes with respect to and then
BHORKAR et al.: ADAPTIVE OPPORTUNISTIC ROUTING FOR WIRELESS AD HOC NETWORKS 249 TABLE II Our simulations are performed in QualNet. We consider two OVERHEAD COMPARISONS sets of topologies in our experimental study. 1) Grid Topology: In Section VI-B, we study a grid topology consisting of 16 indoor nodes such that the nearest neigh- bors are separated by distance meters. If unspeciﬁed, is chosen to be 25 m. The source and the destination are chosen at the maximal distance (on diagonal) from each other. 2) Random Topology: In Section VI-C, we study a randomchoosing the MAX-NEIGHBOUR highest priority nodes.9 topology consisting of 36 indoor nodes placed in an areaSuch a limitation will sacriﬁce the diversity gain and, hence, of 150 150 m . Here, we investigate the performancethe performance of any opportunistic routing algorithm for under a multisource multidestination setting as the numberlower overhead. In practice, we have seen that limiting the of ﬂows in the network is varied and each ﬂow is speciﬁedneighbor set to 4 provides most of the diversity gain. via a randomly selected pair of source and destination. The nodes are equipped with 802.11b radios placed in indoor VI. SIMULATIONS environment transmitting at 11 Mb/s with transmission power 15 dBm. Note that the choice of indoor environment is mo- In this section, we provide simulation studies in realistic wire- tivated by the ﬁndings in , where opportunistic routingless settings where the theoretical assumptions of our study do is found to provide signiﬁcant diversity gains. The wirelessnot hold. These simulations not only demonstrate a robust per- medium model includes Rician fading with K-factor of 4formance gain under d-AdaptOR in a realistic network, but also and log-normal shadowing with mean 4 dB. The path lossprovide signiﬁcant insight in the appropriate choice of the de- follows the two-ray model in  with path exponent of 3.sign parameters such as damping sequence , delivery re- The acknowledgment packets are short packets of length 24 Bward , etc. We ﬁrst investigate the performance of d-AdaptOR transmitted at 11 Mb/s, while FO packets are of length 20 Bwith respect to the design parameters and network parameters in and transmitted at a lower rate of 1 Mb/s to ensure reliability.a grid topology of 16 nodes. We then use a realistic topology of If unspeciﬁed, packets are generated according to a constant36 nodes with random placement to demonstrate robustness of bit rate (CBR) source with rate 20 packets/s. The packets ared-Adaptor to the violation of the analytic Assumptions 1 and 2. http://ieeexploreprojects.blogspot.com 512 B equipped with simple cyclic assumed to be of lengthA. Simulation Setup redundancy check (CRC) error detection. The cost of transmis- sion is assumed to be one unit, and the reward is set to 40. In Sections VI-B and VI-C, using the appropriate choice We have chosen as the exploration parameter ofof the design parameters, we compare the performance of choice.d-AdaptOR against suitably chosen candidates. As a bench-mark, when appropriate, we have compared the performance B. Effects of Design and Network Parametersagainst a genie-aided policy that relies on full network topology Here, we investigate the role and criticality of various designinformation when selecting routes. This is nothing but dis- parameters of d-AdaptOR with respect to the expected numbercussed in Section IV-B. We also compare against Stochastic of transmission criterion. Let us start with design parametersRouting (SR)  (SR is the distributed implementation of andpolicy ) and ExOR  (an opportunistic routing policy with 1) Exploration Parameter Sequence : The convergenceETX metric) in which the empirical probabilistic structure rate of stochastic recursion (2) depends strongly on the choice ofof the network is used to implement opportunistic routing sequence . Convergence is slower with a faster decreasingalgorithms. As a result, their performance will be highly de- sequence and results in less variance in the estimates ofpendent on the precision of empirical probability associated , while with a slow decreasing sequence of , conver-with link . To provide a fair comparison, we have considered gence is fast but results in large variance in the estimates ofsimple greedy versions of SR and ExOR. These algorithms . In Fig. 3, we have plotted the effect of the choice ofadapt to the history of packet reception outcomes and rely sequence by comparing two sequences andon the updates to make routing decisions assuming error-free . Note that under sequence , . We have also compared our performance against a con- d-AdaptOR is slower to adapt to the optimal performance whileventional routing SRCR  with full knowledge of topology. it shows a slightly smaller variance. This is because the choiceIn this setting, a conventional route is selected with perfect of controls the rate with which greedy versus (randomlyknowledge of link success probability at any given node. This chosen) exploration actions are utilized. The optimization of thecomparison in effect provides a simple benchmark for all choice of is an interesting topic of study in stochastic ap-learning-based conventional routing policies in the literature proximation , , far beyond the scope of this work.such as Q-routing  and predictive Q-routing  when 2) Per-Packet Delivery Reward : To ensure an acceptablecongestion is taken to be small enough (such that ﬁnding least performance of d-AdaptOR, the value of delivery reward, ,congested paths coincides with ﬁnding the path with minimum must be chosen sufﬁciently high. This would ensure the exis-expected number of transmissions). tence of routes under which the value of delivering a packet 9In case of ambiguity, the node with the smallest index is chosen. (as represented in ) is worth (i.e., larger than) the cost of
250 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 1, FEBRUARY 2012 Fig. 5. Delivery ratio as is varied.Fig. 3. Comparison for . Fig. 6. d-AdaptOR performance as packet length is varied.Fig. 4. Expected number of transmissions versus time as is varied. http://ieeexploreprojects.blogspot.com Fig. 7. Performance of d-AdaptOR as CBR trafﬁc is varied.relaying and routing that packet. A reasonable choice of isany value larger than the worst-case expected transmission cost.Increasing beyond such a value does not affect the asymptotic slow convergence rate around for , we observe thatoptimality of the algorithm. Next, we study the performance of a nonnegligible number of packets is delivered in the durationd-AdaptOR with respect to the convergence rate and delivery of experiment.ratio. Next, we investigate the performance of d-AdaptOR with Fig. 4 plots the expected number of transmissions rate as respect to other candidate protocols for the network parame-time progresses for various values of . As seen in Fig. 4, if ters such as packet length, trafﬁc rate, neighbor distance, and increases beyond a threshold (in the example provided time-varying costs.here, this threshold is 18, but in general it depends on the 3) Packet Length: We have repeated our simulations fornetwork diameter), the expected number of transmissions 1024-B packets. Fig. 6 plots the performance as the packetper packet achieve the optimal value of . In contrast, for length is varied from 512 to 1024 B. Note that due to the de- , the expected number of transmissions approaches creasing packet transmission reliabilities, the expected routingzero as the packets not worth obtaining routing reward are cost per packet is increased with the packet size. However, thedropped.10 Fig. 4 also shows that the convergence rate of the optimality of d-AdaptOR does not depend on the packet length.expected number of transmissions for routing per packet under 4) Trafﬁc Rate: Fig. 7 plots the mean number of transmis-d-AdaptOR decreases as increases. The slow convergence for sions versus CBR rate for candidate algorithms. Even though for large is due to the ﬂexibility of exploring longer the performance gain for d-AdaptOR decreases somewhat withpaths. The slow convergence to zero for near is increase in the load, there is always a nonnegligible advantageattributed to the fact that it takes a longer time for d-AdaptOR over greedy solutions.to realize that the packet is not worth relaying. 5) Average Hop Length : In an attempt to understand the Fig. 5 plots the delivery ratio as is varied. Fig. 5 shows that performance gap between various opportunistic algorithms,as increases beyond a threshold , the delivery ratio remains speciﬁcally the gap between d-AdaptOR versus learning-basedﬁxed. However, for sufﬁciently small , nearly all the packets conventional routing algorithms – whose performanceare dropped as the cost of transmission of the packet as well as is bounded by SRCR, one needs to gain insight about the diver-relaying is not worth the obtained delivery reward. Due to very sity gain achieved by opportunistic routing. Fig. 8 compares the 10For , we have plotted negative of the expected per-packet reward expected transmission cost for the three opportunistic routingas the expected number of transmissions. algorithms (d-AdaptOR, ExOR, and SR) and SRCR as the
BHORKAR et al.: ADAPTIVE OPPORTUNISTIC ROUTING FOR WIRELESS AD HOC NETWORKS 251Fig. 8. Small hops provide signiﬁcant receiver diversity gain. Fig. 10. Expected number of transmissions and average per-packet reward as function of operation time.Fig. 9. Time-varying cost: Nodes go into sleep mode at time 300 s.distance between the neighboring nodes in the grid topology,measured in meters, is varied from 10 to 30 m. Note that forhigh values of , the receiver diversity is low due to retrans-mission packet losses giving nearly similar performance forcandidate protocols, while small corresponds to a network http://ieeexploreprojects.blogspot.comwith large receiver diversity gain. As expected, when issmall, all opportunistic routing schemes provide a signiﬁcantimprovement over conventional routing, but perhaps what ismore interesting is the performance gain of learning-basedd-AdaptOR over the greedy-based solutions in medium ranges. 6) Time-Varying Cost: In our analytical setup, we assumethe transmission costs are ﬁxed. Next, we discuss a simple sce-nario where the nodes have time-varying transmission costs.Consider a network in which nodes may go into an energy- Fig. 11. ﬂows. multiple d-AdaptOR versus distributed SR, ExOR, and SRCR performance forsaving mode when they do not participate in routing (e.g., torecharge their energy sources). Assume that upon entering theenergy-saving mode, a node announces a high cost of trans- Fig. 10 shows that the d-AdaptOR algorithm outperforms themission (100 instead of usual transmission cost of 1). Fig. 9 greedy opportunistic schemes given sufﬁcient number of packetplots the expected average cost of d-AdaptOR when two nodes deliveries. This is because the greedy versions of SR and ExORat the center of the grid move into an energy-saving mode. It fail to explore possible choices of routes and often result inshows that d-AdaptOR can track the genie-aided solution after strictly suboptimal routing policies. Fig. 10 also shows that thethe nodes move into the energy-saving mode. randomized routing decisions employed by d-AdaptOR work as a double-edged sword. On the one hand, they form a mechanismC. Case Study: Random Network through which network opportunities are exhaustively explored Here, we study a random network scenario consisting of until the globally optimal decisions are constructed, resulting36 wireless nodes placed randomly, with the remaining param- in an improved long-term performance while these randomizedeters kept the same as the default parameters. decisions lead to a short-term performance loss. This, in fact, is Fig. 10 plots the expected number of transmissions and the reminiscent of the well-known exploration/exploitation tradeoffexpected average per-packet reward for the candidate routing in stochastic control and learning literature.algorithms versus network operation time when a single ﬂow Next, we study the performance of d-AdaptOR as the numberis present in the random topology. We ﬁrst note that, as ex- of ﬂows in the network is varied, where each ﬂow is speciﬁedpected, SRCR performs poorly compared to the opportunistic via a randomly selected pair of source and destination. Fig. 11schemes as it fails to utilize the receiver diversity gain. This plots the expected number of transmissions and expectedunderlines our contribution over all existing learning-based so- average reward for the candidate routing algorithms for thelutions – that ignore receiver diversity. Furthermore, random topology. As seen in Fig. 11, d-AdaptOR maintains an
252 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 1, FEBRUARY 2012optimal performance. However, Fig. 11 also shows that the gap components equal to 1 by . We also use the notation tobetween d-AdaptOR and the greedy version of SR signiﬁcantly represent the ﬁrst random elements of the random sequencedecreases with an increase in number of ﬂows where the natural .pattern of trafﬁc ﬂow renders the (randomized) explorationphase less critical. In other words, while Fig. 11 is consistent A. Proof of Lemma 1with the Remark 1 in Section II regarding the decomposition Lemma 1: Let:of multiple-ﬂow scenario to multiple single-ﬂow scenarios, it J1) for all ;also suggests that a joint design in which the multiplicity of J2) .ﬂows provide a natural (and greedy) exploration of the network Then, the sequence obtained by the stochastic recursion (2)might be beneﬁcial with regard to the transient/short-termperformance measures of interest. VII. CONCLUSION AND FUTURE WORK In this paper, we proposed d-AdaptOR, a distributed, adap- converges to almost surely.tive, and opportunistic routing algorithm whose performance is To prove Lemma 1, we note that the adaptive computationshown to be optimal with zero knowledge regarding network given by (2) utilizes a stochastic approximation algorithm totopology and channel statistics. More precisely, under idealized solve the MDP associated with Problem ( ). To study theassumptions, d-AdaptOR is shown to achieve the performance convergence properties of this stochastic approximation, we ap-of an optimal routing with perfect and centralized knowledge peal to known results in the intersection of learning and sto-about network topology, where the performance is measured chastic approximation given below.in terms of the expected per-packet reward. Furthermore, In particular, consider a set of stochastic sequences onwe show that d-AdaptOR allows for a practical distributed , denoted by , and the correspondingand asynchronous 802.11 compatible implementation, whose ﬁltration , i.e., the increasing -ﬁeld generated byperformance was investigated via a detailed set of QualNet , satisfying the following recursive equation:simulations under practical and realistic networks. Simulationsshow that d-AdaptOR consistently outperforms existing adap-tive routing algorithms in practical settings. where is a mapping from into and The long-term average reward criterion investigated in this http://ieeexploreprojects.blogspot.compaper inherently ignores the short-term performance. To cap- , is a vector of possibly delayed componentsture the performance of various adaptive schemes, however, it of . If no information is outdated, then for all andis desirable to study the performance of the algorithms over a . The following important result on the convergenceﬁnite horizon. One popular way to study this is via measuring of is provided in .the incurred “regret” over a ﬁnite horizon. Regret is a function Fact 2 [9, Theorem 2]: Assume and sat-of horizon that quantiﬁes the loss of the performance under isfy the following conditions.a given adaptive algorithm relative to the performance of the G1) For all and a.s.;topology-aware optimal one. More speciﬁcally, our results so for a.s.;far implies that the optimal rate of growth of regret is strictly for a.s.sublinear in , but fails to provide a conclusive understanding G2) is a martingale difference with ﬁnite secondof the short-term behavior of d-AdaptOR. An important area of moment, i.e., , and there existfuture work comprises developing adaptive algorithms that en- constants and such thatsure optimal growth rate of regret. . The design of routing protocols requires a consideration G3) There exists a positive vector , scalars andof congestion control along with the throughput perfor- , such thatmance , . Our work, however, does not consider thisclosely related issue. Incorporating congestion control in op-portunistic routing algorithms to minimize expected delaywithout the topology and the channel statistics knowledge is an G4) Mapping satisﬁes the followingarea of future research. properties. 1) is componentwise monotonically increasing. APPENDIX 2) is continuous. We start this section with a note on the notations used. On the 3) has a unique ﬁxed point .probability space , we use notation to 4) ,denote the indicator random variable (with respect to ), such for any .that for all for all , and G5) For any as . for all . For a vector , Then, the sequence of random vectors converges to the ﬁxedwe use to denote the th element of the vector. Let point almost surely.denote the weighted max-norm with positive weight vector , Let be the increasing -ﬁeld generated by random vec-i.e., . We denote the vector in with all tors . Let be the random vector of
BHORKAR et al.: ADAPTIVE OPPORTUNISTIC ROUTING FOR WIRELESS AD HOC NETWORKS 253dimension , generated via recursiveequation (2). Furthermore Thus Assumption (G2) of Fact 2 is satisﬁed. • (G3): Let denote the set of states that contain the destination node . Moreover, let Let be a random vector whose th element . Let be the hittingis constructed as follows: time associated with set and policy , i.e., . Policy is said to be proper if . Let us now ﬁx a proper deterministic stationary policy . Existence of such a policy is guaranteed from the connectivity between 0 and . Let be the termination state that is reachedwhere , and is the most recent state visited by after taking the termination action . Let us deﬁne a policynode . dependent operator Now, we can rewrite (2) and (3) as in the form investigatedin Fact 2, i.e., (10) We then consider a Markov chain with states and with the following dynamics: From any state , we move to state , with probability . The remaining steps of the proof reduce to verifying state- Thus, subsequent to the ﬁrst transition, we are always at aments G1–G5. This is veriﬁed in Lemma 4. state of the form , and the ﬁrst two components Lemma 4: satisfy conditions G1–G5. of the state evolve according to policy . As is assumed Proof: proper, it follows that the system with states also http://ieeexploreprojects.blogspot.com to a proper policy. We construct a ma- • (G1): It is shown in Lemma 6 that algorithm d-AdaptOR evolves according guarantees that every state-action is attempted inﬁnitely trix with each entry corresponding to the transition from often (i.o.). Hence state to with value equal to for all for all . Since policy is proper, the maximum eigenvalue of ma- trix is strictly less than 1. As is a nonnegative ma- trix, Perron Frobenius theorem guarantees the existence of visited i.o. a positive vector with components and some such that However (11) From (11), we have a positive vector such that , where is the ﬁxed point of equation . From the deﬁnition of (4) and (10), we have • (G2): . Using this and the tri- angle inequality, we obtain establishing the validity of (G3). • (G4): Assumption (G4) is satisﬁed by operator using the following fact: Fact 3 [19, Proposition 4.3.1]: is monotonically in- creasing, continuous, and satisﬁes
254 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 1, FEBRUARY 2012 . is a ﬁxed point Lemma 6: If policy is followed, then each state-action of . From (5) and (6), we obtain is visited inﬁnitely often. Proof: We say states communicate if there ex- (12) ists a sequence of actions such that proba- bility of reaching state from state following the sequence of actions is greater than zero. Using Lemma 5, Furthermore, using (5) and (12), for all if state is visited i.o., then every action is chosen i.o. as the set is ﬁnite. Hence, states such that , are visited i.o. if is visited i.o. By Lemma 5, every action is also visited i.o. Fol- (13) lowing similar argument and repeated application of Lemma 5, every state that communicates with state and actions The existence of ﬁxed point follows from (13), while is visited i.o. uniqueness of follows from uniqueness of (Fact 1). Under the assumption of the packet generation process in • (G5): Suppose as . Therefore, there existsSection II, a packet is generated i.o. at the source node 0. Thus, such that for all . This means that the number state is reached i.o. The construction of set is such that of times that node has transmitted a packet is bounded every state communicates with state . Thus, each by . However, this contradicts Lemma 6, which says that is visited i.o. since is ﬁnite. each state-action pair is visited i.o. Therefore, as for all , and condition (G5) holds. B. Proof of Lemma 2 Thus, Assumptions (G1)–(G5) are satisﬁed. Hence, from Lemma 2: For any (P)-admissible policy forFact 2, our iterate (2) converges almost surely to , the unique Problem ( ) and for allﬁxed point of . Lemma 5: If policy is followed, then action isselected i.o. if state is visited i.o. Proof: Deﬁne the random variable for any . Let be the -ﬁeld generated by . Let for any http://ieeexploreprojects.blogspot.com the lemma, we refer to the Auxil- Proof: To prove . iary Problem ( ). In this problem we have assumed the From the construction of the algorithm, it is clear that is existence of a centralized controller with full knowledge measurable. Now, it is clear that under policy is of the local broadcast model. Mathematically speaking,independent of given and . Deﬁne let be the sample space of the random probability measures for the local broadcast model. Speciﬁcally, is a nonsquare left stochastic matrix . for all Moreover, let be the trivial -ﬁeld generated by the if if (14) local broadcast model (sample point in ), i.e., .11 Recall that denotes the set of nodes that have received the packet due to transmission from node at time , while denotes the corresponding routing decision node takes at time .12 For Auxiliary Problem ( ), a routing policy is a collection of routing decisions taken for all for nodes at the centralized controller, where denotes a sequence of random actions for node . is visited i.o. The routing policy is said to be (AP)-admissible for Auxiliary Problem ( ) if the event belongs to the product is visited i.o. -ﬁeld . From Fact 1, since is the optimal policy for one packet, for each packet and for any feasible policy is visited i.o. (15)The next step of the proof is based on the following fact. Fact 4 [28, Corollary 5.29] (Extended Borel–CantelliLemma): Let be an increasing sequence of -ﬁelds and let be -measurable. If , then . 11 -ﬁeld captures the knowledge of the realization of local broadcast model Thus, from Fact 4, is visited i.o. if is visited i.o. and assumes a well-deﬁned prior on these models. 12 if node does not transmit at time .
BHORKAR et al.: ADAPTIVE OPPORTUNISTIC ROUTING FOR WIRELESS AD HOC NETWORKS 255where the inequality follows from the fact that . The where is the set of nodes that have successfully receivedremaining steps are straightforward packet at time due to transmission from node . We call event a misrouting of order . For Now for packets , let us consider the expected dif- ferential reward under policies andC. Proof of Lemma 3 Lemma 3: For any Proof: From (5), (6), and (12), we obtain the following (17)equality for all : (18) (16) (19)Let where . Inequality (17) is obtained by noticing that http://ieeexploreprojects.blogspot.com maximum loss in the reward occurs if algorithm d-AdaptOR decides to drop packet (no reward) while there exists a node in the set of potential forwarders such that .Lemma 1 implies that, in an almost sure sense, there exists Thus, for all , the expected average per-packet rewardpacket index such that for all under policy is bounded asIn other words, from time onwards, given any nodeand set , the probability that d-AdaptOR chooses anaction such thatis upper-bounded by . Furthermore, since (Lemma 6), for a given , with probability 1, thereexists a packet index such that for all . Let . For all packets with index ACKNOWLEDGMENT , the overall expected reward is upper-bounded by The authors would like to thank A. Plymoth and P. Johansson and lower-bounded by , hence their for the valuable discussions. They are grateful to the anonymouspresence does not impact the expected average per-packet re- reviewers who provided thoughtful comments and constructiveward. Consequently, we only need to consider the routing deci- critique of the paper.sions of policy for packets . Consider the th packet generated at the source. Let REFERENCESbe an event for which there exist instances when d-AdaptORroutes packet differently from the possible set of optimal ac-  C. Lott and D. Teneketzis, “Stochastic routing in ad hoc wireless net- works,” in Proc. 39th IEEE Conf. Decision Control, 2000, vol. 3, pp.tions. Mathematically speaking, event occurs iff there exist 2302–2307, vol. 3.instances such that for all  P. Larsson, “Selection diversity forwarding in a multihop packet radio network with fading channel and capture,” Mobile Comput. Commun. Rev., vol. 2, no. 4, pp. 47–54, Oct. 2001.  M. Zorzi and R. R. Rao, “Geographic random forwarding (GeRaF) for ad hoc and sensor networks: Multihop performance,” IEEE Trans. Mo- bile Comput., vol. 2, no. 4, pp. 337–348, Oct.–Dec. 2003.
256 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 1, FEBRUARY 2012  S. Biswas and R. Morris, “ExOR: Opportunistic multi-hop routing for Abhijeet A. Bhorkar received the B.Tech. and wireless networks,” Comput. Commun. Rev., vol. 35, pp. 33–44, Oct. M.Tech. degrees in the electrical engineering from 2005. the Indian Institute of Technology, Bombay, India,  S. Jain and S. R. Das, “Exploiting path diversity in the link layer in both in 2006, and is currently pursuing the Ph.D. wireless ad hoc networks,” in Proc. 6th IEEE WoWMoM, Jun. 2005, degree in electrical and computer engineering at the pp. 22–30. University of California, San Diego.  C. Lott and D. Teneketzis, “Stochastic routing in ad hoc networks,” His research interests are primarily in the areas of IEEE Trans. Autom. Control, vol. 51, no. 1, pp. 52–72, Jan. 2006. stochastic control and estimation theory, information  E. M. Royer and C. K. Toh, “A review of current routing protocols for theory, and their applications in the optimization of ad hoc mobile wireless networks,” IEEE Pers. Commun., vol. 6, no. 2, wireless communication systems. pp. 46–55, Apr. 1999.  T. Javidi and D. Teneketzis, “Sensitivity analysis for optimal routing in wireless ad hoc networks in presence of error in channel quality es- timation,” IEEE Trans. Autom. Control, vol. 49, no. 8, pp. 1303–1316, Aug. 2004. Mohammad Naghshvar (S’10) received the B.S.  J. N. Tsitsiklis, “Asynchronous stochastic approximation and degree in electrical engineering from Sharif Uni- Q-learning,” in Proc. 32nd IEEE Conf. Decision Control, Dec. versity of Technology, Tehran, Iran, in 2007, and is 1993, vol. 1, pp. 395–400. currently pursuing the M.S./Ph.D. degrees in elec-  J. Boyan and M. Littman, “Packet routing in dynamically changing trical and computer engineering at the University of networks: A reinforcement learning approach,” in Proc. NIPS, 1994, California, San Diego. pp. 671–678. His research interests include stochastic con-  J. W. Bates, “Packet routing and reinforcement learning: Estimating trol theory, network optimization, and wireless shortest paths in dynamic graphs,” 1995, unpublished. communication.  S. Choi and D. Yeung, “Predictive Q-routing: A memory-based rein- forcement learning approach to adaptive trafﬁc control,” in Proc. NIPS, 1996, pp. 945–951.  S. Kumar and R. Miikkulainen, “Dual reinforcement Q-routing: An on-line adaptive routing algorithm,” in Proc. Smart Eng. Syst., Neural Tara Javidi (S’96–M’02) studied electrical en- Netw., Fuzzy Logic, Data Mining, Evol. Program., 2000, pp. 231–238. gineering at the Sharif University of Technology,  S. S. Dhillon and P. Van Mieghem, “Performance analysis of the Tehran, Iran, from 1992 to 1996. She received the AntNet algorithm,” Comput. Netw., vol. 51, no. 8, pp. 2104–2125, M.S. degrees in electrical engineering (systems) and 2007. applied mathematics (stochastics) and Ph.D. degree  P. Purkayastha and J. S. Baras, “Convergence of Ant routing algorithm in electrical engineering and computer science from via stochastic approximation and optimization,” in Proc. IEEE Conf. the University of Michigan, Ann Arbor, in 1998, Decision Control, 2007, pp. 340–354. 1999, and 2002, respectively.  D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming. From 2002 to 2004, she was an Assistant Professor Belmont, MA: Athena Scientiﬁc, 1996. http://ieeexploreprojects.blogspot.com the Electrical Engineering Department, Univer-  S. Chachulski, M. Jennings, S. Katti, and D. Katabi, “Trading structure with sity of Washington, Seattle. She joined the University for randomness in wireless opportunistic routing,” in Proc. ACM SIG- of California, San Diego, in 2005, where she is currently an Associate Professor COMM, 2007, pp. 169–180. of electrical and computer engineering. Her research interests are in communi-  M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dy- cation networks, stochastic resource allocation, and wireless communications. namic Programming. New York: Wiley, 1994. Dr. Javidi was a Barbour Scholar during the 1999–2000 academic year and  D. P. Bertsekas and J. N. Tsitsiklis, Parallel and Distributed Compu- received an NSF CAREER Award in 2004. tation: Numerical Methods. Belmont, MA: Athena Scientiﬁc, 1997.  W. Stallings, Wireless Communications and Networks, 2nd ed. Upper Saddle River, NJ: Prentice-Hall, 2004.  J. Bicket, D. Aguayo, S. Biswas, and R. Morris, “Architecture and eval- uation of an unplanned 802.11b mesh network,” in Proc. ACM Mo- Bhaskar D. Rao (S’80–M’83–SM’91–F’00) biCom, Cologne, Germany, 2005, pp. 31–42. received the B.Tech. degree in electronics and elec-  M. Kurth, A. Zubow, and J. P. Redlich, “Cooperative opportunistic trical communication engineering from the Indian routing using transmit diversity in wireless mesh networks,” in Proc. Institute of Technology, Kharagpur, India, in 1979, IEEE INFOCOM, Apr. 2008, pp. 1310–1318. and the M.S. and Ph.D. degrees from the University  J. Doble, Introduction to Radio Propagation for Fixed and Mobile of Southern California, Los Angeles, in 1981 and Communications. Boston, MA: Artech House, 1996. 1983, respectively.  S. Russel and P. Norvig, Artiﬁcial Intelligence: A Modern Approach, Since 1983, he has been with the University 2nd ed. Upper Saddle River, NJ: Prentice-Hall, 2003. of California, San Diego, where he is currently a  R. Parr and S. Russell, “Reinforcement learning with hierarchies of Professor with the Department of Electrical and machines,” in Proc. NIPS, 1998, pp. 1043–1049. Computer Engineering. His interests are in the  P. Gupta and T. Javidi, “Towards throughput and delay optimal routing areas of digital signal processing, estimation theory, and optimization theory, for wireless ad-hoc networks,” in Proc. Asilomar Conf., Nov. 2007, pp. with applications to digital communications, speech signal processing, and 249–254. human–computer interactions.  M. J. Neely, “Optimal backpressure routing for wireless networks with Dr. Rao has been a Member of the Statistical Signal and Array Processing multi-receiver diversity,” in Proc. CISS, Mar. 2006, pp. 18–25. Technical Committee of the IEEE Signal Processing Society. He is currently a  L. Breiman, Probability. Philadelphia, PA: SIAM, 1992. Member of the Signal Processing Theory and Methods Technical Committee.  S. Resnick, A Probability Path. Boston, MA: Birkhuser, 1998.