IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,              VOL. 23,      NO. 4,    APRIL 2012                    ...
644                                          IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,         VOL. 23,   NO....
646                                         IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,    VOL. 23,   NO. 4,   ...
:                ð6Þ                                                                                                 Dsort...
is used to adjust fij to a morethe highest P values. There may be some overlap between reasonable value. The proposed sche...
nearest node that has dk multiplied by the data size, and By controlling
Upcoming SlideShare
Loading in …5

Balancing the trade offs between query delay and data availability in mane ts.bak


Published on

1 Comment
  • ss
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Balancing the trade offs between query delay and data availability in mane ts.bak

  1. 1. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 23, NO. 4, APRIL 2012 643 Balancing the Trade-Offs between Query Delay and Data Availability in MANETs Yang Zhang, Student Member, IEEE, Liangzhong Yin, Jing Zhao, and Guohong Cao, Fellow, IEEE Abstract—In mobile ad hoc networks (MANETs), nodes move freely and link/node failures are common, which leads to frequent network partitions. When a network partition occurs, mobile nodes in one partition are not able to access data hosted by nodes in other partitions, and hence significantly degrade the performance of data access. To deal with this problem, we apply data replication techniques. Existing data replication solutions in both wired or wireless networks aim at either reducing the query delay or improving the data availability, but not both. As both metrics are important for mobile nodes, we propose schemes to balance the trade-offs between data availability and query delay under different system settings and requirements. Extensive simulation results show that the proposed schemes can achieve a balance between these two metrics and provide satisfying system performance. Index Terms—Data replication, data availability, query delay, mobile ad hoc network (MANET). Ç1 INTRODUCTIONI N mobile ad hoc networks (MANETs), since mobile nodes replicating most data locally can reduce the query delay, but move freely, network partition may occur, where nodes it reduces the data availability since many nodes may endin one partition cannot access data held by nodes in other up replicating the same data locally, while other data itemspartitions. Thus, data availability (i.e., the number of are not replicated by anyone. To increase the data avail-successful data accesses over the total number of data ability, nodes should not replicate the same data thataccesses) in MANETs is lower than that in conventional neighboring nodes already have. However, this solutionwired networks. Data replication has been widely used to may increase the query delay since some nodes may not beimprove data availability in distributed systems, and we able to replicate the most frequently accessed data, and have http://ieeexploreprojects.blogspot.comwill apply this technique to MANETs [1]. By replicating to access it from neighbors. Although the delay of accessingdata at mobile nodes which are not the owners of the the data from neighbors is shorter than that from the dataoriginal data, data availability can be improved because owner, it is much longer than accessing it locally.there are multiple replicas in the network and the In this paper, we propose new data replication techni-probability of finding one copy of the data is higher. Also, ques to address query delay and data availability issues. Asdata replication can reduce the query delay since mobile both metrics are important for mobile nodes, we proposenodes can obtain the data from some nearby replicas. techniques to balance the trade-offs between data avail-However, most mobile nodes only have limited storage ability and query delay under different system settings andspace, bandwidth, and power, and hence it is impossible for requirements. Simulation results show that the proposedone node to collect and hold all the data considering these schemes can achieve a balance between these two metricsconstraints. By taking these issues into consideration, we and provide satisfying system performance.expect that mobile nodes should not be able (or willing) to The rest of the paper is organized as follows: The nextreplicate all data items in the network (more discussions in section presents some preliminaries of data replication. InAppendix A, which can be found on the Computer Society Section 3, we describe the proposed schemes in detail.Digital Library at Section 4 evaluates the proposed schemes through extensive10.1109/TPDS.2011.222). simulations and Section 5 concludes the paper. One solution to improve the data access performanceconsidering the resource constraints of mobile nodes is to letthem cooperate with each other; That is, contribute part of 2 DATA REPLICATIONtheir storage space to hold data of others [2], [3]. When a Data replication has been extensively studied in thenode only replicates part of the data, there will be a trade-off web environment and distributed database systems (Seebetween query delay and data availability. For example, Appendix B, available in the online supplemental material, for detailed literature review). However, most of them either do not consider the storage constraint or ignore the. The authors are with the Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA 16802. link failure issue. Before addressing these issues by E-mail: {yangzhan, yin, jizhao, gcao} proposing new data replication schemes, we first introduceManuscript received 2 Nov. 2010; revised 8 July 2011; accepted 12 Aug. 2011; our system model.published online 18 Aug. 2011. In a MANET, mobile nodes collaboratively share data.Recommended for acceptance by R. Baldoni. Multiple nodes exist in the network and they send queryFor information on obtaining reprints of this article, please send e-mail, and reference IEEECS Log Number TPDS-2010-11-0655. requests to other nodes for some specified data items. EachDigital Object Identifier no. 10.1109/TPDS.2011.222. node creates replicas of the data items and maintains the 1045-9219/12/$31.00 ß 2012 IEEE Published by the IEEE Computer Society
  2. 2. 644 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 23, NO. 4, APRIL 2012 TABLE 1 Access Probability to Data Items Fig. 1. The data availability under different link failure probabilityreplicas in its memory (or disk) space. During data between N1 and N2 .replication, there is no central server that determines theallocation of replicas, and mobile nodes determine the data . C: the memory size of each mobile node for hostingallocation in a distributed manner. data replicas. The MANET studied in this paper can be represented as . tij : the delay of transmitting a data item of unit sizean undirected graph GðV ; EÞ where the set of vertices V between node Ni and Nj .represent the mobile nodes in the network, and E & V Â V is . fij : the link failure probability between node Ni andthe set of edges in the graph, which represents the physical Nj .or logical links between the mobile nodes. Two nodes that . aij : the data access frequency of node Ni to dj .can communicate directly with each other are connected byan edge in the graph. Let N denote a network of m mobile 3 THE PROPOSED DATA REPLICATION SCHEMESnodes, N1 ; N2 ; . . . ; Nm and let D denote a collection of n data In this section, we propose several schemes to address theitems d1 ; d2 ; . . . ; dn distributed in the network. For each pair data replication problem based on heuristics. Beforeof mobile nodes Ni and Nj , let tij denote the delay of presenting these heuristics, we first use an example totransmitting a data item of unit-size between these two illustrate the basic ideas.nodes. Similar to [4], we assume that the delay functiondefines a metric space; that is, they are nonnegative, 3.1 A Motivating Examplesymmetric, and satisfy the triangle inequality. Links be- Suppose a network has only two nodes N1 and N2 . Thesetween mobile nodes may fail and the link failure probability two nodes may access four data items d1 ; . . . ; d4 with equal http://ieeexploreprojects.blogspot.combetween Ni and Nj is denoted as fij , which is equal to fji as size, and each node only has enough space to host two datawe assume symmetric links. The failed links may cause items. Similar to [8], we assume that the access probabilitynetwork partitions. Queries generated during network of a mobile node to a data item is available. Thesepartition may fail because the requested data items are not probabilities are listed in Table 1.available in the partition to which the requester belongs. According to the Dynamic Access Frequency and Each node maintains some amount of data locally and Neighborhood (DAFN) scheme proposed by Hara [8],the node is called the original owner of the data. Each data neighboring nodes should try to remove duplicated dataitem has one and only one original owner. For simplicity, items to save storage space and increase data availability. Inwe assume that data items are not updated, and similar the first replication step, nodes replicate the data that theytechniques used in [5], [6], and [7] can be used to extend the are interested in, and hence both nodes replicate d1 and d2proposed scheme to handle data update or data consistency locally. In the second step of DAFN, when two neighboringissues. To improve the data availability, these data items nodes have the same data item di , the node that has a lowermay be replicated to other nodes. Because of limited access probability should replace di with the next mostmemory size, each node can only host CðC < nÞ replicas frequently accessed data. Therefore, N1 replaces d2 with d3besides its original data. The data replication problem, and N2 replaces d1 with d4 . The final replication result is: N1either optimizing the query delay or optimizing the hosts d1 and d3 whereas N2 hosts d2 and d4 .availability, has been proved to be a reduction from the From this example and verified by simulations in [8],metric uncapacitated facility location problem, which is DAFN is a good scheme because duplicated data can beknown to be NP-hard (see Appendix C, available in the removed from neighboring nodes and the memory size canonline supplemental material). Therefore, instead of trying be used effectively. However, the data availability may beto find a complex algorithm that is not practical to solve or affected when the link failure probability is high. Fig. 1approximate the problem, we use heuristics that can illustrates the average data availability as a function of linkprovide satisfying performance with much less computa- failure probability. As can be seen, the average data availability of DAFN drops as the link failure probabilitytion overhead. The following notations are used in this paper. increases. To address the weakness of DAFN, we can design a new . N : the set of mobile nodes in the network. scheme, denoted as “Our” in Fig. 1. In our scheme, both N1 . m: the total number of mobile nodes. and N2 host d1 and d2 due to their high data access . D: the set of available data items in the network. frequency. As shown in Fig. 1, the data availability in our . n: the total number of data items. scheme becomes higher than that of DAFN when the link . si : the size of di . failure probability is higher than 0.25. Another advantage of
  3. 3. ZHANG ET AL.: BALANCING THE TRADE-OFFS BETWEEN QUERY DELAY AND DATA AVAILABILITY IN MANETS 645our scheme is the low-query delay. Since the data items are Suppose node Ni and Nj are neighboring nodes. Nibuffered locally, the query delay of our simple scheme calculates the combined access frequency value of Ni and Nj kshould be much shorter than that of DAFN when accessing to data item dk at Ni , denoted as CAFij , by using thed1 or d2 . following function: In this example, the simple solution outperforms the kDAFN scheme because DAFN does not consider two CAFij ¼ ðaik þ ajk  ð1 À fij ÞÞ=si : ð1Þimportant factors: the link stability between mobile nodes Similarly Nj calculates its combined access frequency toand the query delay. Due to the complexity of the data d with the following function: kreplication problem shown in Appendix C, available in theonline supplemental material, we propose some heuristics. k CAFji ¼ ðajk þ aik  ð1 À fij ÞÞ=si : ð2Þ Heuristics. Because mobile nodes have limited memory,it is impossible for them to hold all their interested data We also need to consider the increased data availabilityitems. As a result, they have to rely on other nodes to get due to neighboring nodes. If the neighboring node Nj of Nisome data. If mobile nodes only host their interested data, it has already replicated the data and the link failureis possible that some data items are replicated by every probability between Ni and Nj is low, Ni is less likely tonode while some other data items are not replicated by replicate this data because it can always get the data from Nj .anyone. Therefore, it is important for mobile nodes to However, if the link failure probability is high, Ni may like tocooperate with each other and contribute part of their replicate the data locally. Therefore, we define a prioritymemory to hold data for other nodes. The problem is to value for node Ni to replicate data dk given its neighboring kdetermine the memory space that a mobile node should node Nj , denoted as P ij , by using the following function:contribute because bad cooperation may actually degradethe performance, as shown in the above example. P k ¼ CAFij  !k ; ij k ij ð3Þ We have the following heuristics: For a mobile node, if its where !k indicates the impact on data availability by the ijcommunication links to other nodes are stable, more neighboring node and the link failure probability. The valuecooperation with these nodes can improve the data of !k is calculated as follows: ijavailability; if the links to other nodes are not very stable, it is better for the node to host most of the interested data k fij if data dk is replicated at Nj ;locally. The above heuristic mainly addresses the issue of !ij ¼ 1 if data dk is not replicated at Nj :data availability. For query delay, it is better to allocate data data according to the priority valuenear the interested nodes. The degree of cooperation affects Each node sorts theboth the data availability and the query delay. In the P and picks data items with the highest P to replicate in itsfollowing, we propose various schemes to achieve various memory until no more data items can be replicated. The Pperformance goals. value function is designed so that3.2 The Greedy Data Replication Scheme 1. it considers the access frequency from a neighboringOne naive greedy data replication scheme is to allocate the node to improve data availability;most frequently accessed data items until the memory is 2. it considers the data size. If other criteria are thefull. However, this naive scheme, referred to as Greedy, same, the data item with smaller size is given higherdoes not consider the data size difference between different priority for replicating because this can improve thedata items. The data size should be considered because performance while reducing memory space;smaller data require less memory space, and hence 3. it gives high priority to local data access, and hence the interested data should be replicated locally toreplicating them can save some memory space for other improve data availability and reduce query delay;data items. Therefore, a better greedy scheme is to calculate 4. it considers the impact of data availability from thethe data access frequency of a data item dk by normalizing it neighboring node and link quality.against the data size, i.e., aik =sk . This greedy scheme, referred to as Greedy-S, lets node Thus, if the link between two neighboring nodes are stable,Ni repeatedly pick the data item with the largest aik =sk they can have more cooperations in data replication.value from the data set that has not yet been replicated at It is possible that according to OTOO, node Ni shouldNi until no more data can be replicated in the memory. host dj but Ni is separated from nodes that have dj becauseOne drawback of the greedy scheme is that it does not of network partitions. In this situation, Ni selects the nextconsider the cooperation between the neighboring nodes best candidate (data item) according to the replicationand hence its performance may be limited. We present the scheme. This rule is also applied to other replicationperformance analysis and numerical results in Appendix schemes proposed in the following. The detailed pseudo-D, available in the online supplemental material. The code and descriptions of the OTOO scheme and thefollowing sections present schemes that apply different following schemes are provided in Appendix E, availablelevels of cooperation between neighboring nodes following in the online supplemental material.our heuristics. 3.4 The Reliable Neighbor (RN) Scheme3.3 The One-To-One Optimization (OTOO) Scheme OTOO considers neighboring nodes when making dataIn this scheme, each mobile node only cooperates with at replication choices. However, it still considers its ownmost one neighbor to decide which data to replicate. access frequency as the most important factor because the
  4. 4. 646 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 23, NO. 4, APRIL 2012access frequency from a neighboring node is reduced by a nodes in the group to maximize the data availability andfactor of the link failure probability. To further increase the minimize the data access delay within the of cooperation, we propose the Reliable Neighbor In the RG scheme, there is no redundant replication untilscheme which contributes more memory to replicate data every data item is replicated at least once. Therefore, thefor neighboring nodes. In this scheme, part of the node’s maximum degree of cooperation within the reliable groupmemory is used to hold data for its Reliable Neighbors. For can be achieved. Because the function for selecting the bestnode Ni , a neighboring node Nj is considered to be Ni ’s node to place each data replica considers the access delayreliable neighbor if between the query node and the nearest replication node in 1 À fij > T r ; the group, the RG scheme can reduce the number of hops that the data need to be transferred to serve the query.where T r is a threshold value. Let nbðiÞ be the set of Ni ’s Due to the page limitation, the detailed protocolreliable neighbors. The total contributed memory size of Ni , description and a comprehensive performance complexitydenoted as CcðiÞ, is set to be and bound analysis of the proposed schemes are presented 0 1 in Appendices E and F, available in the online supplemental X CcðiÞ ¼ C  min@1; ð1 À fij Þ=A; ð4Þ material. Nj 2nbðiÞwhere is a system tuning factor which affects the memory 4 PERFORMANCE EVALUATIONSallocated to itself and its neighbors. In this section, we evaluate the performance of the proposed Intuitively, Ni contributes more memory if its links with schemes: OTOO, RN2 (RN with ¼ 2), RN8 (RN withneighboring nodes are more stable. The two extreme cases ¼ 8), RN16 (RN with ¼ 16), and RG by comparing themare: 1) when CcðiÞ ¼ C, Ni contributes all its memory to with the DAFN scheme [8] and the Greedy scheme throughhold data for neighboring nodes; 2) when fij ¼ 1; 8Nj 2 extensive simulations.nbðiÞ, Ni does not contribute any memory. The reasonbehind the RN scheme is that when links to neighboring 4.1 Simulation Setupnodes of Ni are stable, Ni can hold more data for We have developed a simulator based on CSIM 19 [9] toneighboring nodes as they also hold data for Ni . Because evaluate the performance of the data replication schemes.links are stable, such cooperation can improve the data At the beginning of the simulation, m nodes are placedavailability. If links are not stable, data on neighboring randomly in a 2;500 m  2;500 m area. The radio range is set http://ieeexploreprojects.blogspot.comnodes have low availability and may incur high-query to be D. If two nodes Ni and Nj are within the radio rangedelay. Thus, cooperation in this case cannot improve data (i.e., Dði; jÞ D), they can communicate with each other.availability and nodes should be more “selfish” in order to The communication link between them may fail and theachieve better performance. link failure probability fij is defined as The data replication process works as follows: Node Ni first allocates its most interested data to its memory, up to Dði; jÞ 2C À CcðiÞ memory space. Then all the rest of the data are fij ¼ Á
  5. 5. : ð6Þ Dsorted according to P to a list called the neighbor’s interestlist. The P value of Ni to dk is defined as Equation (6) is adopted according to the facts that the 0 1, wireless signal strength decreases with a rate between the X order of r2 , where r is the distance to the signal source. For k Pi ¼ @ k yj  ajk  ð1 À fij ÞA sk : ð5Þ example, if two connected nodes have a long distance, they Nj 2nbðiÞ are easier to disconnect and the link failure probability The memory space of CcðiÞ is used to allocate data with between them is higher.
  6. 6. is used to adjust fij to a morethe highest P values. There may be some overlap between reasonable value. The proposed schemes do not depend onNi ’s interested data and the allocated data interested by the failure model in (6) and they are able to work as long asNi ’s neighbors. If during the allocation, a data item is the failure probability between neighboring nodes can bealready in the memory, this data item will not be allocated estimated.again and the next data item on the neighbor’s interest list is Similar to [8], the number of data items n is set to be thechosen instead. same as the number of nodes m. Data item di ’s original host is Ni , for all i 2 ½1; mŠ. The data item size is uniformly3.5 Reliable Grouping (RG) Scheme distributed between smin and smax . Each node has a memoryOTOO only considers one neighboring node when making size of replication decisions. RN further considers all one-hop Two access patterns are used in the simulation.neighbors. However, the cooperations in both OTOO and RNare not fully exploited. To further increase the degree of 1. All nodes follow the Zipf-like access pattern, butcooperation, we propose the reliable grouping scheme which different nodes have different hot data items. This isshares replicas in large and reliable groups of nodes, whereas done by randomly selecting an offset value for eachOTOO and RN only share replicas among neighboring node Ni : offseti , which is between 1 and n À 1. Thenodes. The basic idea of the RG scheme is that it always picks actual access probability of Ni to data item dk isthe most suitable data items to replicate on the most suitable given by
  7. 7. ZHANG ET AL.: BALANCING THE TRADE-OFFS BETWEEN QUERY DELAY AND DATA AVAILABILITY IN MANETS 647 TABLE 2 Simulation Parameters Fig. 2. Performance of RN and RG as a function of T r when nodes have the same access pattern. 1 Pik ¼ Pn : 95 percent confidence interval for the measured data is less ðððk þ n À offseti Þ%n þ 1Þ 1 j¼1 j than 10 percent of the sample mean. This means that the most frequently accessed data item id is moved to be offseti instead of 1; the 4.2.1 Fine-Tuning T r second frequently accessed data item id is offseti þ In Fig. 2, we evaluate the effects of T r , which affects the 1 instead of 2, and so on. number of cooperative neighbors in the RN scheme and 2. All nodes have the same access pattern and they the RG scheme. Larger T r results in smaller number of have the same access probability to the same data cooperative neighbors, and vice versa. We can see that T r item. has more significant effects on the performance of RN2 In order to avoid routing cycles on the query path, a (RN with ¼ 2) than RN8 and RN16, because RN2maximal hop count is used ffiffi limit the number of hops for contributes the largest portion of the memory size to p toeach query. It is set to be 2Á2500 , where 2,500 is the size of neighbors. The performance of the RG scheme is also Dthe simulation area. affected by T r , because the change of T r affects the The performance metrics used in the simulation are number of nodes in a reliable group. From Fig. 2a, we canmainly data availability and query delay. The amount of see that when T r 0:4, as long as T r increases, the dataquery traffic is also evaluated to show the protocol availability of RG, RN8, and RN16 are decreasing while theoverhead. Here, we note that the amount of query traffic data availability of RN2 is increasing; when 0:4 T r 0:6, all schemes have a decreasing trend in data availability ascan also be used as a metric of system power consumption. T r increases; similar trend can be found when T r 0:6. InThis is because in wireless communication, data transmis- Fig. 2b, when T r changes from 0.2 to 0.4, RG and RN2 havesion is the key factor affecting the system power consump- a large decrease in query delay; however, when T rtion compared to other factors such as disk or CPU becomes larger than 0.4, all four schemes have stable andoperations [10]. Therefore, if there is more query traffic, small delay decrease. When T r is around 0.6, all schemesmore energy consumption is expected. If one replication have relatively stable performance, which means thescheme generates less traffic, it is more power efficient. change of T r does not have significant effect on the When a query for data dk is generated by node Ni , if dk relative performance of different data replication schemes.can be found locally, or at a node that is reachable through Thus, we use T r ¼ 0:6 in the following.single or multihops, this access is considered successful.The query delay is the number of hops from Ni to the 4.2.2 Fine-Tuning
  8. 8. nearest node that has dk multiplied by the data size, and By controlling
  9. 9. , the link failure probability can be adjusted.query traffic is defined as all messages involved to serve the When the link failure probability deceases, data availabilityquery. If dk is in the local memory of Ni , the query delay increases as shown in Fig. 3. We choose
  10. 10. ¼ 0:95 to achieveand query traffic are both 0. Most system parameters are a balance among all replication schemes.listed in Table 2. As can be seen from the figure, DAFN has high-query4.2 Simulation Results delay because it tries to avoid duplicated data amongExperiments were run using different workloads andsystem settings. The performance analysis presented hereis designed to compare the effects of different workloadparameters such as Zipf parameter, network size, radiorange, memory size, and node mobility (due to the spacelimitation, the effects of different mobility models areprovided in Appendix G, available in the online supple-mental material). For each workload parameter (e.g., themean update arrival time or the mean query generate time),the mean value of the measured data is obtained bycollecting a large number of samples such that the Fig. 3. Performance as a function of
  11. 11. when nodes have the sameconfidence interval is reasonably small. In most cases, the access pattern.
  12. 12. 648 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 23, NO. 4, APRIL 2012Fig. 4. Performance as a function of when nodes have different access Fig. 5. Performance as a function of when nodes have the samepattern. access pattern.neighboring nodes. Even if a data item is popular among the relation of their data availability is RG RN2 RN8 RN16 % OT OO (RG performs the best as expected) whiletwo neighboring nodes, it is still allocated at only one of theneighboring nodes. Therefore, many accesses have to be the relations of their query delay is RG RN2 RN8 satisfied by the querying neighboring nodes, which RN16 OT OO (OTOO performs the best). This clearlyincreases the query delay. For similar reasons, the query shows the trade-offs between these two performancedelay of RG is also high. However, RG considers all nodes metrics. Higher degree of cooperation improves the datain a reliable group during data replication. It organizes dataavailability, but it also increases the query delay becausebetter within each reliable group, which helps RG achieve more data items need to be retrieved from neighboringhigher data availability. nodes. This figure also gives us directions on how to achieve certain performance goals. If high data availability is4.2.3 Effects of the Zipf Parameter () required, nodes should be more cooperative with neighbor-In this section, we evaluate the effects of the Zipf ing nodes so that more data can be replicated in the network.parameter on the system performance. As increases, If low-query delay is more important, nodes should be moremore accesses focus on hot data items and data availability “selfish” so that requests can be served locally instead of byis expected to increase. neighboring nodes. Fig. 4 demonstrates the effects of the Zipf parameter on Since RN2, RN8, and RN16 exhibit similar performancethe system performance when nodes have different access when other parameters change, to make the simulation http://ieeexploreprojects.blogspot.compattern. Fig. 4a shows that the proposed schemes outper- figures clear, we will only use RN8 to represent the RNform the DAFN scheme in terms of data availability in most schemes.cases. The reasons are as follows: first, our schemesconsider the link failure probability when replicating data 4.2.4 Effects of the Number of Nodes in the Network (m)(for OTOO and RN) or organizing groups (for RG); second, The number of nodes in the network indicates the nodethe OTOO and RN schemes avoid replicating data items density of the network. When the number of nodesthat are not frequently accessed by using the P value. On increases, the density of the network increases and itthe other hand, the DAFN scheme does not consider the becomes better connected and the data availability in-link failure probability and it sometimes replicates data creases. Fig. 6 shows the effects of the number of nodes onitems with low-access frequency instead of frequently the system performance. In Fig. 6a, we can see that when there are only 100 nodes in the network, all schemes haveaccessed data items, as shown in the example in Section 3.1. relatively lower data availability due to the sparse network Fig. 4b shows the query delay of different schemes. The connectivity. As the number of nodes increases, nodes haveDAFN scheme is outperformed by the proposed schemes in more opportunities to get the data from their neighboringall situations. This shows that our schemes can achieve nodes, and all schemes have performance improvements inbetter performance in terms of data availability and query terms of data availability as expected. When the networkdelay. From Fig. 4b, we can also find that the relation of density further increases, e.g., in a 500-nodes scenario, thequery delay is RG RN2 RN8 % RN16 OT OO. This data availability of all schemes approaches to 0.9. Similarshows that when nodes have different interests, to achieve a observations can be found in Fig. 6b. Therefore, we chooselow-query delay, it is better for them to host the data thatthey are interested in, and cooperation among them doesnot show significant advantage. Fig. 5 shows the effects of the Zipf parameter on thesystem performance when nodes have the same accesspattern. We can see from Fig. 5 that all the proposed schemesperform much better than the DAFN scheme in terms of dataavailability and all the proposed schemes in most situationsperform better than DAFN in terms of query delay. Greedy-Sperforms better than Greedy because it gives higher priorityto data items with smaller size, and thus more importantdata can be replicated and the performance is improved. Fig. 6. Performance as a function of number of nodes in the networkComparing RN2, RN8, RN16, OTOO, and RG, we find that when nodes have different access pattern.
  13. 13. ZHANG ET AL.: BALANCING THE TRADE-OFFS BETWEEN QUERY DELAY AND DATA AVAILABILITY IN MANETS 649Fig. 7. Performance as a function of the radio range when nodes have different access pattern.m ¼ 300 as the default setting to see the effects of different availability for OTOO, RN8, Greedy, Greedy-S, and DAFNschemes on the system performance. is not very large because when nodes have different access pattern, they can simply replicate their interested data4.2.5 Effects of the Radio Range (D) locally to achieve a high data availability. Thus, the roomFig. 7 shows the effects of the radio range on the system for improvement is small. RG, however, organizes dataperformance under different access pattern. When the radio replications within each reliable group. It can provide morerange increases, the network is better connected and the different data items in each group. Thus, its data availabilitydata availability is expected to increase. Fig. 7a shows that is much higher than other schemes.all schemes perform as expected. The proposed schemesperform much better than DAFN when the radio range issmall. When the radio range is very large, different schemes 5 CONCLUSIONShave similar data availability. This is because the network In MANETs, due to link failure, network partitions arepartition is very rare in this situation and most data can be common. As a result, data saved at other nodes may not befound in a reachable node. accessible. One way to improve data availability is through Figs. 7b and 7c show that the query delay and query data replication. In this paper, we proposed several datatraffic increase as the radio range increases. This is because replication schemes to improve the data availability and http://ieeexploreprojects.blogspot.comwhen the network is better connected, some previously reduce the query delay. The basic idea is to replicate theunavailable data can be found at faraway nodes. The most frequently accessed data locally and only rely onproposed schemes always result in lower query delay and neighbor’s memory when the communication link to themtraffic than the DAFN scheme. When the radio range is is reliable.extremely small, the query delay of all schemes reduces to Extensive performance evaluations demonstrate that thenear zero, since it is hard to find a neighbor with such small proposed schemes outperform the existing solutions inradio range and almost all requests are served locally. terms of data availability and query delay. Results also show that there is a fundamental trade-off between data4.2.6 Effects of Memory Size (C) availability and query delay. Higher degree of cooperationIn this section, we evaluate the system performance when improves the data availability, but it also increases thethe memory size (C) changes. As C increases, more data can query delay because more data need to be retrieved frombe hosted by a node and the data availability increases. neighboring nodes.Similarly, more data can be found locally as C increases andthe query delay and query traffic decrease. Fig. 8 shows that when nodes have different access ACKNOWLEDGMENTSpatterns, the proposed schemes increase the data avail- This work was supported in part by the US National Scienceability while providing lower query delay and query traffic Foundation (NSF) under grant number CNS-0721479, andcompared to the DAFN scheme. The difference of data by Network Science CTA under grant W911NF-09-2-0053.Fig. 8. Performance as a function of memory size when nodes have different access pattern.
  14. 14. 650 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 23, NO. 4, APRIL 2012REFERENCES Jing Zhao received the BS degree from Peking University, Beijing, China. He is currently work-[1] T. Hara and S.K. Madria, “Data Replication for Improving Data ing toward the PhD degree in computer science Accessibility in Ad Hoc Networks,” IEEE Trans. Mobile Computing, and engineering at the Pennsylvania State vol. 5, no. 11, pp. 1515-1532, Nov. 2006. University. His research interests include wire-[2] L. Yin and G. Cao, “Supporting Cooperative Caching in Ad Hoc less networks, distributed systems, and mobile Networks,” IEEE Trans. Mobile Computing, vol. 5, no. 1, pp. 77-89, computing, with a focus on mobile ad hoc Jan. 2006. networks.[3] B. Tang, H. Gupta, and S. Das, “Benefit-Based Data Caching in Ad Hoc Networks,” IEEE Trans. Mobile Computing, vol. 7, no. 3, pp. 289-304, Mar. 2008.[4] I. Baev and R. Rajaraman, “Approximation Algorithms for Data Guohong Cao received the BS degree from Placement in Arbitrary Networks,” Proc. 12th Ann. ACM-SIAM Xian Jiaotong University, China and the MS and Symp. Discrete Algorithms (ACM-SIAM SODA), pp. 661-670, 2001. PhD degrees in computer science from the Ohio[5] T. Hara and S. Madria, “Consistency Management Strategies for State University in 1997 and 1999, respectively. Data Replication in Mobile Ad Hoc Networks,” IEEE Trans. Mobile Since then, he has been with the Department of Computing, vol. 8, no. 7, pp. 950-967, July 2009. Computer Science and Engineering at the[6] T. Hara, “Replica Allocation in Ad Hoc Networks with Periodic Pennsylvania State University, where he is Data Update,” Proc. Int’l Conf. Mobile Data Management (MDM), currently a professor. His research interests 2002. are wireless networks and mobile computing.[7] J. Cao, Y. Zhang, G. Cao, and L. Xie, “Data Consistency for He has published more than 150 papers in the Cooperative Caching in Mobile Environments,” Computer, vol. 40, areas of wireless sensor networks, wireless network security, vehicular no. 4, pp. 60-66, Apr. 2007. ad hoc networks, data access and dissemination, and distributed fault[8] T. Hara, “Effective Replica Allocation in Ad Hoc Networks for tolerant computing. He has served on the editorial board of IEEE Improving Data Accessibility,” Proc. IEEE INFOCOM, 2001. Transactions on Mobile Computing, IEEE Transactions on Wireless[9] H. Schwetman, “Csim19: A Powerful Tool for Building System Communications, IEEE Transactions on Vehicular Technology, and has Models,” Proc. 33rd Conf. Winter Simulation, pp. 250-255, 2001. served as program committee members of many conferences. He was a[10] R. Kravets and P. Krishnan, “Power Management Techniques for recipient of the US National Science Foundation (NSF) CAREER award Mobile Communication,” Proc. ACM MOBICOM, pp. 157-168, in 2001. He is a fellow of the IEEE and the IEEE Computer Society. 1998. Yang Zhang received both the BS and ME . For more information on this or any other computing topic, degrees from Nanjing University, Nanjing, Chi- please visit our Digital Library at na, in 2002 and 2005, respectively. He is currently working toward the PhD degree at the Pennsylvania State University. His research interests include distributed systems, mobile computing, and vehicular adhoc networks. He is a student member of the IEEE.Liangzhong Yin received the PhD degree in computer science andengineering from the Pennsylvania State University, in 2005. Hisresearch interests include wireless networks and mobile computing.