Your SlideShare is downloading. ×
Query evaluation over network of data aggregators
Query evaluation over network of data aggregators
Query evaluation over network of data aggregators
Query evaluation over network of data aggregators
Query evaluation over network of data aggregators
Query evaluation over network of data aggregators
Query evaluation over network of data aggregators
Query evaluation over network of data aggregators
Query evaluation over network of data aggregators
Query evaluation over network of data aggregators
Query evaluation over network of data aggregators
Query evaluation over network of data aggregators
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Query evaluation over network of data aggregators

375

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
375
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. INTERNATIONALComputer EngineeringCOMPUTER ENGINEERING International Journal of JOURNAL OF and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME & TECHNOLOGY (IJCET)ISSN 0976 – 6367(Print)ISSN 0976 – 6375(Online)Volume 4, Issue 1, January- February (2013), pp. 301-312 IJCET© IAEME: www.iaeme.com/ijcet.aspJournal Impact Factor (2012): 3.9580 (Calculated by GISI) ©IAEMEwww.jifactor.com QUERY EVALUATION OVER NETWORK OF DATA AGGREGATORS Rajesh Bharati1, Amit V. Kore2 Department of Computer Engineering, DYPIET Pimpri, Pune Department of Computer Engineering, DYPIET Pimpri, Pune ABSTRACT Continuous queries are used to monitor changes to time varying data and to provide results useful for online decision making. Typically a user desires to obtain the value of some aggregation function over distributed data items, for example, to know value of portfolio for a client; or the AVG of temperatures sensed by a set of sensors. In these queries a client specifies a coherency requirement as part of the query. Present a low-cost, scalable technique to answer continuous aggregation queries using a network of aggregators of dynamic data items. In such a network of data aggregators, each data aggregator serves a set of data items at specific coherencies. Just as various fragments of a dynamic web-page are served by one or more nodes of a content distribution network, our technique involves decomposing a client query into sub-queries and executing sub-queries on judiciously chosen data aggregators with their individual sub-query incoherency bounds. Provide a technique for getting the optimal set of sub-queries with their incoherency bounds which satisfies client querys coherency requirement with least number of refresh messages sent from aggregators to the client. For estimating the number of refresh messages, Build a query cost model which can be used to estimate the number of messages required to satisfy the client specified incoherency bound. Performance results using real-world traces show that our cost based query planning leads to queries being executed using less than one third the number of messages required by existing schemes I. INTRODUCTION These aggregation queries are long running queries as data is continuously changing and the user is interested in notifications when certain conditions hold. Thus, responses to these queries are refreshed continuously. In these continuous query applications, users are likely to tolerate some inaccuracy in the results. That is, the exact data values at the corresponding data sources need not be reported as long as the query results satisfy user specified accuracy requirements. For instance, a portfolio tracker may be happy with an accuracy of $10.i. Password should be easy to remember. 301
  • 2. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEMEData incoherency: Data accuracy can be specified in terms of incoherency of a data item, :defined as the absolute difference in value of the data item at the data source and the value known to a ceclient of the data. Let v(t) denote the value of the it" data item at the data source at time t;and let the value the data item known to the client be u(t). Then the data incoherency atthe client is given by | v(t)-u(t)| For a data item which needs to be refreshed at an u(t)|.incoherency bound C a data refresh message is sent to the client as soon as dataincoherency exceeds C, i.e., | v(t) v(t)-u(t)| > C.Network of data aggregators: Data refresh from data sources to clients can be don aggregators: doneusing push or pull based mechanisms. In a push based mechanism data sources sendupdate messages to clients on their own whereas in pull based mechanism data sourcessend messages to the client only when the client makes a request. It is assume the pushbased mechanism for data transfer between data sources and clients. For scalablehandling of push based data dissemination, network of data aggregators are pro pro-posed.In such network of data aggregators, data refreshes occur from data sources to theclients through one or more data aggregators. s Fig. 1 Data dissemination network for multiple data itemsHere it is assume that each data aggregator maintains its configured incoherency boundsfor various data items. From a data dissemination capability point of view, each data pointaggregator (DA) is characterized by a set of (d, c) pairs, where d, is a data item which theDA can disseminate at an incoherency bound c. The configured incoherency bound of adata item at a data aggregator can be maintained using any of following methods: (a)The data source refreshes the data value of the DA whenever DAs incoherency bound isabout to get violated. This method has scalability problems. (b) Data aggregator(s) withtighter incoherency bound help the DA to maintain its incoherency bound in a scalablemanner.II. PROBLEM STATEMENT Value of a continuous weighted additive aggregation query, at time t, can becalculated as: nq V q (t ) = ∑ ( V qi ( t ) × W qi ) i =1 (1) 302
  • 3. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEMEVq is the value of a client query q involving nq data items with the weight of the ith dataitem being wqit , Suppose the result for the query given by Equation (1) needs to becontinuously provided to a user at the query incoherency bound Cq• Then, thedissemination network has to ensure that: nq | ∑ ( v qi ( t ) − u qi ( t )) × w qi | ≤ C q i =1 (2) Whenever data values at sources change such that query incoherency bound isviolated, the updated value should be refreshed to the client If the network ofaggregators can ensure that the iO, data item has incoherency bound Cqit then thefollowing condition ensures that the query incoherency bound Cq is satisfied: nq ∑ (C qi × wqi ) ≤ Cq i =1 (3)III. DATA DISSEMINATION COST MODEL The presented model is to estimate the number of refreshes required to disseminate a dataitem while maintaining a certain incoherency bound. There are two primary factors affectingthe number of messages that are needed to maintain the coherency requirement: (a) The coherency requirement itself and (b) Dynamics of the dataConsider a data item which needs to be disseminated at an incoherency bound C, i.e., newvalue of the data item will be pushed if the value deviates by more than C from the lastpushed value. Thus the number of dissemination messages will be proportional to theprobability of |v (t) - u (t) |greater than C for data value vet) at the source/ aggregator and u(t)at the client, at time t. A data item can be modeled as a discrete time random process whereeach step is correlated with its previous stepa) Data Dynamics Model Specifically, the cost of data dissemination for a data item will be proportional todata sumdiff defined as Rs = ∑ | si − si −1 | i (5)Where si and si −1 are the sampled values of a data item 5 at ith and (i-1)th time instances(i.e., consecutive ticks). In practice, sumdiffvalue for a data item can be calculated at thedata source by taking running average of difference between data values for consecutiveticks. 303
  • 4. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEMEb) Incoherency Bound Model Data source pushes the data value whenever it differs from the last pushed value by anamount more than C. Client estimates data value based on server specified parameters. Thesource pushes the new data value whenever it differs from the (client) estimated value by anamount more than C In both these cases, value at the source can be modeled as a random process with averageas the value known at the client In case (b), the client and the server estimate the data value asthe mean of the modeled random process whereas in case (a) deviation from the last pushedvalue can be modeled as zero mean process. Using Chebyshevs inequality: P(| v(t ) − u(t ) |> C ) ∝ 1 / C 2 (4)Thus, it hypothesize that the number of data refresh messages is inversely proportional to thesquare of the incoherency bound.c) Combining data dissemination models Number of refresh messages is proportional to data sum- diff Rs and inverselyproportional to square of the incoherency bound. Further, it need not disseminate anymessage when either data value is not changing ( Rs = 0) or incoherency bound isunlimited (1/ C 2 =0). Thus, for a given data item 5, disseminated with an incoherencybound C, the data dissemination cost is proportional to R / C 2 . In the next section, it use thisdata dissemination cost model for developing cost model for additive aggregation queriesIV. COST MODEL FOR ADDITIVE AGGREGATION QUERIES Consider an additive query over two data items P and Q with weights Wp and Wqrespectively and then need to estimate its dissemination cost. H data items are disseminatedseparately, the query sumdiff will be: Rdata = wpRp +wqRq = wp ∑ pi − pi−1 | +wq ∑ qi −qi−1 | | | (6)Instead, if the aggregator uses the information that client is interested in a query over P and Q(rather than their individual values), it creates and pushes a composite data item (WpP+WqQ)then the query sumdiffwill be: Rquery = ∑| wp ( pi − pi−1 ) + wq (qi − qi−1 ) | (7)Rquery is clearly less than or equal compared to Rdata . Thus IT need to estimate the sumdiff of anaggregation query (i.e., Rquery ) given the sumdiff values of individual data items (i.e., Rp andRq). Only data aggregators are in a position to calculate Rquery as different data items may bedisseminated from different sources. Here develop the query cost model in two stages. 304
  • 5. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEMEa) Modeling Correlation between Data Dynamics From Equations (6) and (7) it can see that if two data items are correlated such that asthe value of one data item increases that of the other data item also increase then R query will becloser to R data . On the other hand if the data items are inversely correlated then R query will beless compared to R data . Thus, intuitively, it can represent the relationship between Rquery andsumdiff values of the individual data items using a correlation measure associated with the pairof data items. Specifically, if p is the correlation measure then Rquery can be written as: 2 (w2 Rp + wq Rq + 2ρwp Rp wq Rq ) p 2 2 2 R query ∝ (w2 + wq + 2ρwp wq ) p 2 (8)The correlation measure ρ is defined such that − 1 ≤ ρ ≤1 . So, R query will always be less than| wpR p + w q R q | and always be more than | w p R p − w q R q | .The above relation can be betterunderstood from its similarity with the standard deviation of the sum of two randomvariables. For data items P and Q, ρ can be calculated as: ρ= ∑(p i − p i −1 )(q i − q i −1 ) (9) ( ∑(p i − p i −1 ) 2 ∑ (q i − q i −1 ) 2b) Query based Normalization Suppose there is need to compare the cost of two queries: a SUM queryinvolving two data items and an AVG query involving the same set of data items. Letthe query incoherency bound for the SUM and the AVG queries be C =2C and C2=C,respectively. From Equation (8), sumdiff of the SUM query will be double that of theAVG query. Hence, query evaluation cost (as per R / C 2 ) of the SUM query will be hallthat of the AVG query. But, intuitively, disseminating the AVG of two data items at agiven incoherency bound should require the same number of refresh messages as theirSUM with double the incoherency bound. Thus, there is a need to normalize querycosts. From a query execution cost point of view, a query with weights Wj andincoherency bound C is the same as query with weights nWj and incoherency bound nc.So, while normalizing need to ensure that both, query weights and incoherency bounds,are multiplied by the same factor.Normalized query sumdiff is given by Rquery=(w2Rp +w2Rq +2ρwpRpw Rq)/(w2 +w2 +2ρwpw ) 2 p 2 q 2 q p q q (10)The value of the normalizing factor for Rquery should be 1 / w 2 + w q + 2 ρw p w q p 2The value of the incoherency bound has to be adjusted by the same factor. Normalizationensures that queries with arbitrary values of weights can be compared for execution cost 305
  • 6. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEMEestimates. Equation (10) can be extended to get query sumdiff for any general weightedaggregation query given by Equation (1) as: nq nq nq ∑ w qi R i2 + 2 ∑ ∑ ρ ij w qi w qj R i R j 2 i =1 i =1 j =1, j ≠ i RQ = nq nq nq ∑ i=1 2 w qi + ∑ ∑ i =1 ρ ij w qi w qj j =1, j ≠ iV. QUERY PLANNING FOR WEIGHTED ADDITIVE AGGREGATION QUERIES For executing an incoherency bounded continuous query, a query plan is required.The query planning problem can be stated as:Inputs:(1) A network of data aggregators in the form of a relation f (A, D, C) specifying the Ndata aggregators ak ∈ A(1 ≤ k ≤ N ) , set Dk ⊆ D of data items disseminated by the dataaggregator a k and incoherency bound t kj , which the aggregator a k can ensure for eachdata item d kj ∈ D k(2) Client query q and its incoherency bound C q . An additive aggregation query q can berepresented as ∑w d qi qiWhere wqi ; is the weight of the data item d qi ; for 1 ≤ i ≤ nqOutputs:(1) q k , for (1 ≤ k ≤ N ) , i.e., sub-query for each data aggregator ak .(2) C qk , for (1 ≤ k ≤ N ) i.e., incoherency bounds for all the sub queries. Thus, to get aquery plan here need to perform following tasks:1. Determining sub-queries: For the client query q get sub queries q k s for each data aggregator.2. Dividing incoherency bound: Divide the query incoherency bound C q , among sub-queries to get C qk s .For optimal query planning, above tasks are to be performed with the followingobjective and constraints:Optimization objective: Number of refresh messages is minimized. For a sub query q k , theestimated number of refresh messages is given by kR / C where R qk is the sumdiff of qk 2 qkthe sub query q k , C qk is the incoherency bound assigned to it and k, the proportionalityfactor, is the same for all sub queries of a given query q. Thus total number of refreshmessages is estimated as: N R qk Z q = k∑ 2 k =1 C qk (12) 306
  • 7. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEMEHence Z , needs to be minimized for minimizing the number of refreshes. qConstraint 1: q k is executable at ak : Each DA has the data items required to execute the sub-query allocated to it, i.e., for each data item d qki required for the sub query q , d qki ∈ D k kConstraint2: Query incoherency bound is satisfied: Query incoherency should be less than orequal to the query incoherency bound. For additive aggregation queries, value of theclient query is the sum of sub-query values. As different sub-queries are disseminatedby different data aggregators, need to ensure that sum of sub-query incoherencies is lessthan or equal to the query incoherency bound. Thus ∑C qk ≤ Cq (13)Constraint3: Sub-query incoherency bound is satisfied: Data incoherency bounds at ak ( t kj ford qki should be such that the sub-query incoherency bound C qk can be satisfied at ∈ D kthat DA. The tightest incoherency bound T qk which the data aggregator a k can satisfy forthe given sub-query q k can be calculated as Tqk = ∑ (Wqi × t qj | d qi ≡ d qj ) nqkFor satisfying this constraint 1 ensure C qk > T qkFollowing is the outline of my approach for solving this constraint optimization problemas detailed in the rest of this chapter: it proves that determining sub-queries whileminimizing Z q , as given by Equation (12), is NP hard. If the set of sub-queries ( q k ) isalready given, sub-query incoherency bounds C qk s can be optimally determined tominimize Z q . As optimally dividing the query into sub queries is NP-hard and there is noknown approximation algorithm, it present two heuristics for determining sub-queries whilesatisfying as many constraints as possible (Constraint1 and Constraint2 to be precise).Then it present variation of the two heuristics for ensuring that sub-query incoherencybound is satisfied (Constraint3). In particular, to get a solution of the query planningproblem, the heuristics presented are used for determining sub-queries. Then, using theset of sub-queries, the method outlined is used for dividing in coherency boundVI. OPTIMAL ALLOCATION OF QUERY INCOHERENCY BOUNDAMONG SUB -QUERIES If know the division of the client query into sub queries, using Equation (11) cancalculate sumdiffvalues of all the sub-queries. Thus, it need to minimize Z, given by Equation(12) subject to Constaint2 (query incoherency bound is satisfied) and Constaint3 (sub-queryincoherency bound is satisfied). Then it can get a close form expression by solving Equation (12)with Equation (13) using Lagrange Multiplier scheme. In that scheme minimizes N N 2 ∑ (R k =1 qk / C qk ) + λ (∑ C qk − C q ) k =1 307
  • 8. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEMEFor a constraint λ to get values of C qk s as N C qk = C q Rqk 3 /(∑ Rqk 3 ) 1/ 1/ k =1 (14)i.e., without the Constaint3, sub-query incoherency bounds should be allocated in proportion to R1/3 qk . Use this expression to develop heuristics for optimally dividing the client query intosub-queries. If it also considers Constaint3 then, it can model the problem of minimization ofZ, (while satisfying Constaint2 and Constaint3) as a (non-linear) convex optimizationproblem. The non-linear convex optimization problem can be solved using various convexoptimization techniques available in the literature such as gradient descent method, barriermethod etc. It used gradient descent method (fmincon function in MATLAB) to solve thisnonlinear optimization problem to get the values of individual sub-query incoherency boundsfor a given set of sub queries. Here next describe two greedy heuristics to determine sub-queries while using the formulations developed in this section.a) Greedy Heuristics for Deriving the Sub-queries The algorithm gives the outline of greedy algorithm for deriving sub-queries. First,get a set of maximal sub-queries (Mq) corresponding to all the data aggregators in the network.The maximal sub-query for a data aggregator is defined as the largest part of the query whichcan be disseminated by the DA (i.e., the maximal sub-query has all the query data itemswhich the DA can disseminate). For example, consider a client query 50dl +200d2+100d3.Forthe data aggregators a1, and a2, the maximal sub-query for a1 will be ml=50d1 + 100d3, whereasfor a2 it will be m2=50d1 + 200d2. For the given client query (q) and relation consisting of dataaggregators, data-items, and data incoherency bounds (f(A,D,C))maximal sub queries can beobtained for each data aggregator by forming sub-query involving all data items in theintersection of query data items and those being disseminated by the DA. This operation canbe performed in O|q|.max| Dk |.Where |q| is number of data items in the query max| Dk |, is the maximum number of dataitems disseminated by any DA. For each sub-query m ∈ M q , its sumdiff Rm can be calculatedusing Equation (11). Different criteria (ψ ) can be used to select a sub-query in each iterationof various greedy heuristics. All data items covered by the selected sub-query are removedfrom all the remaining sub queries in M q before performing the next iteration. It should benoted that sub-queries for DAs can be null. Now it describe two criteria (ψ ) for the greedyheuristics;1) min-cost: estimate of query execution cost is minimized, and2) max-gain: estimated gain due to executing the query using sub-queries is maximized.a) Minimum Cost HeuristicAs there is need to minimize the query cost, a sub-query with minimum cost per data item can bechosen in each iteration of the algorithm criterion ψ ≡ minimize ( R m / C m | m |) . But from 2Equation (14) it can see that the sub-query incoherency bounds should be allocated inproportion to Rk / 3 . Using Equations (12) and (14) 1 308
  • 9. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME N 1 Z 1/3 q = C 2/3 ∑ k =1 R qk/ 3 1 qFrom Equation (15), it is clear that for minimizing the query execution cost it should selectthe set of sub queries so that is minimized. It can do that by using criterion ψ ≡ ∑ R 1/3 qkminimize ( Rm/ 3 / m |) in the greedy algorithm. Once IT get the optimal set of sub-queries it can 1use Equation (15) and Constraint 3 (Cqk ≥ Tqk) to optimally allocate the query incoherencybound among them using any of the convex optimization techniques. But this method of firstderiving sub-queries and then allocating the incoherency bounds has a problem which isdescribed nextb) Maximum Gain Heuristic Now here presents an algorithm which instead of minimizing the estimated queryexecution cost maximizes the estimated gains of executing the client query using subqueries. In this algorithm, for each sub-query, here calculate the relative gain of executingit by finding the sumdiff difference between cases when each data item is obtainedseparately and when all the data items are aggregated as a single sub-query (i.e.,maximal sub-query). Thus, the relative gain for a sub-query ∑ wi di , can be written as: ∑w d i i i Gm = −1 ∑ w R + ∑∑ ρ i 2 i i 2 j i ij wi w j Ri R jWhere Ri is sumdiff of the data item di this algorithm can be implemented by usingcriterion ψ ≡ maximize ( G m / | m |) to get the set of sub-queries and corresponding DAs.Then it uses the convex optimization method to allocate incoherency bounds among subqueries. To tackle the query satisfiability issue the query gain Equation (16) is modifiedto: (∑ wiTi ) i G m = Gm − Cq Rm/ 3 1where T, is tightest incoherency bound that can be satisfied for the data item d; and R",is the sub-query sumdiff Reasons for selecting the particular extended objective function aresame as ones outlined for the min-cost heuristic To summarize, for a given client queryand a network of data aggregators, first it get the maximal sub-queries for all dataaggregators. Here it uses heuristics described in this section to derive sub-queries. Inthese heuristics extended objective functions are used to have the desired level of querysatisfiability. 309
  • 10. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEMEVII. QUERY PLANNING FOR MAX QUERIES This describes the optimal query planning for MAX queries. MIN queries can behandled in the similar manner. A MAX query, where a client wantsthe maximum of a specified set of data item values, can be written as: Vq (t ) = max(vqi (t ),1 ≤ i ≤ nq )For MAX queries, relationship between the query incoherency bound and required dataincoherency bounds. According to one such formulation, if the network of aggregatorscan ensure that the ith data item has incoherency bound C; then the following conditionensures that the query incoherency bound Cq is satisfied: Ci ≤ Cq , ∀i,1 ≤ i ≤ nqIn these queries even if values of one or more data item change (changing theirindividual incoherencies) it is possible that query incoherency remains unchanged.Thus, for a given MAX query, it is possible to have an individual data (or sub-query)incoherency bound which is more than the query incoherency bound. But such anincoherency bound will depend on instantaneous values of data items thus changingvery dynamically.a) Query Cost Model Let us consider a query Q= MAX (A, B), which is used for disseminating max ofdata items A and B from a data aggregator. Let the sumdiff values of A and B is R. and Rbrespectively. For a MAX query, the query result is the maximum of data item values.Thus the query dynamics is decided as per the dynamics of the data item with themaximum value. Hence, the query sumdiff is nothing but weighted average of datasumdiffs, weighted by fraction of time when the particular data item is maximum: nq nq Rq ≈ ∑ Ri ( i =1 ∏ p( x j =1, j ≠ i i > x j ) ≤ max(Ri | 1 ≤ i ≤ nq )where P(Xit > Xj) is the probability that value of ith data item is more than value of jth dataitem.It has the values of data item sumdiffs but for getting the probabilities it need to haveexact values of data items. As query plan dependent on individual data values (insteadof data dynamics) will be too volatile, as a first approximation it uses upper bound ofthe expression given by Equation (20) as query sumdiff. Approximation used is themaximum ofsumdiffs of data items involved. Now it considers the optimized execution of MAXqueries using the above mentioned query cost model. 310
  • 11. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEMEb) Greedy Heuristics It uses greedy algorithm for solving the query planning problem with differentset of sub-query selection criteria (ψ ). In the min-cost heuristic it selects the subquery having minimum sub-query sumdiff per data item. For the MAX query, sub-query sumdiff is nothing but the sumdiff of the most dynamic data item in the sub-query. Thus, for the max-gain heuristic, the gain of each sub- query is calculated asgiven in Equation (21):result ← φwhile M q ≠ φ{ Choose a sub-query mi ∈ M q with criterion ψ result ← result ∪ mi ; M q ← M q − {mi } ; For each data item d ∈ mi For each mi ∈ M q mi ← m j − {d }; if mi = φ M q ← M q − {mi }else calculate sumdiff for modified m j ;return result}VIII. CONCLUSION Here, presented a cost based approach to minimize the number of refreshesrequired to execute an incoherency bounded continuous query. It is assume theexistence of a network of data aggregators, where each DA is capable ofdisseminating a set of data items at their pre-specified incoherency bounds.Developed an important measure for data dynamics in the form of sumdiff which, isa more appropriate measure compared to the widely used standard deviation basedmeasures. For optimal query execution it divides the query into sub-queries andevaluates each sub-query at a judiciously chosen data aggregator. Performanceresults show that by my method the query can be executed using less than one thirdthe messages required for existing schemes. It is showed that the following featuresof the query planning algorithms improve performance:1) Dividing the query into sub-queries (rather than data items) and executing themat specifically chosen data aggregators.2) Deciding the query plan using sumdiff based mechanism specifically bymaximizing sub-query gains. Executing queries such that more dynamic data itemsare part of a larger sub-query. 311
  • 12. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEMEREFERENCES[1] Rajeev Gupta, and Krithi Ramamritham,” Query Planning for Continuous Aggregation Queries over a Network of Data Aggregators” IEEE 2012 Transactions on Knowledge and Data Engineering ,Volume 24,Issue:6[2] D. VanderMeer, A. Datta, K Dutta, H Thomas and K Ramamritham, "Proxy- Based Acceleration of Dynamically Generated Content on the World Wide Web", ACM Transactions on Database Systems (faDS) Vol. 29,June2004.[3] J. Dilley, B. Maggs, J. Parikh. H Prokop, R. Sitaraman and B. Weihl, "Globally Distributed Content Delivery", IEEE Internet Computing Sept 2002.[4] S. Rangarajan. S. Mukerjee and P. Rodriguez, "User Specific Request Redirection in a Content Delivery Network", 8th Inti. Workshop on Web Content Caching and Distribution (IWCW), 2003.[5] S. Shah, K Ramamritham, and P. Shenoy, "Maintaining Coherency of Dynamic Data in Cooperating Repositories", VIDB 2002.[6] T. H Cormen, Charles E. Leiserson, Ronald L Rivest, and Clifford Stein. Introduction to Algorithms. MIT Press and McGraw-Hill 2001.[7] Y. Zhou, B. Chin Ooit and Kian-Lean Tan, "Disseminating Streaming Data in a Dynamic Environment: An Adaptive and Cost Based Approach", The VIDBJoumal, Issue 17, pg.I465-1483, 2008.[8] Asokan M, “jQuery Websites Performance Analysis Based On Loading Time: An Overview” International journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 1, 2013, pp. 211 - 217, Published by IAEME.[9] Bharathi M A, Vijaya Kumar B P and Manjaiah D.H, “Power Efficient Data Aggregation Based On Swarm Intelligence And Game Theoretic Approach In Wireless Sensor Network” International journal of Computer Engineering & Technology (IJCET), Volume 3, Issue 3, 2012, pp. 184 - 199, Published by IAEME. 312

×