A user from an emergency management agency sends a query to the flood sensor DB: “For the next 3 hours, retrieve every 10 minutes the maximum rainfall level in each county in Southern California, if it is greater than 3.0 inches”
Select max( rainfall_level), county from Sensors
where state = 'Southern California‘ group by county
( S I 1 ,S I 2 ) join operator that relates tuples having the same timestamp TS. For every new tuple read on one of the input streams the join operator checks if the last tuple read from the other stream has the same timestamp.
(S I 1 , S I 2 ), sync-join, where S I 2 is an on-demand stream. The sync-join requests the activation of S I 2 only when a tuple arrives on S I 1 .
SELECT * FROM 1.Magnetism, 2.Acceleration, 3.Temperature WHERE p1(1.Magnetism) and p2(2.Acceleration) and p3(3.Temperature) EVERY 1000
where p1, p2, and p3 are some predicates on magnetism, acceleration and temperature readings, respectively, with probability Pr(p1) = 0.01, Pr(p2) = 0.05, Pr(p3) = 0.1
Analysis of Cost of execution QP1 is obtained by applying the left deep join trees rule. QP2 is obtained from QP1 by using the selections push-down rule and their allocation on the node where data are generated QP3 is obtained from QP2 by using rules for transforming joins into sync-joins.
(a) count field is associated with the epoch duration field as well as each entry in the various
lists (attribute list, agg list and predicate list), which denotes the number of user queries that require that piece of data. This is to facilitate the maintenance of the synthetic query when user queries terminate.
(b) A from list field contains the user queries which the synthetic query is responsible for.
(c) A flag field denotes the current status of this synthetic query.
(d) A benefit field indicates the benefit that can be gained by the synthetic query (in comparisonto processing the individual user queries).
Transmission cost of a result message from one node to another can be estimated as C start + C trans ·len(q i ).
To measure the average transmission cost incurred by qi for each unit of time, we have to estimate the number of per-unit time transmissions incurred by qi, which is related to the number of result messages generated by the sensors as well as the number of hops required to forward the messages back to the base station.
First, we look at the per-unit time number of result messages generated by a set of sensor nodes N k , which is denoted as result(q i, ,N k ). At the end of each epoch of q i, , one result message would be generated by a sensor node whose readings satisfy the predicates of q i . Therefore, we have
result(q i, N k ) = (sel(q i, N k ) · |N k | )/epoch i (1)
where sel(q i, N k ) is the selectivity of the query predicates over N k , which is equal to the percentage of sensor nodes in N k whose readings can satisfy the query predicates, epoch i is the epoch length of qi.
Sharing over time - more progressive sharing over time by scheduling data acquisition and transmission of all queries in a whole.
At the end of a query’s propagation phase, setSampleRate is triggered, which may start (or restart) the node’s clock to fire at the GCD of the “epoch duration” of all the queries. We set the epoch start time on sensor nodes to be divisible by the epoch duration instead of the arrival time of a new query (here we assume that every epoch duration is divisible by 2048ms).
Sharing over space - After the sample rate has been set at each node, data will be retrieved periodically and transmitted out of the network to the base station. During the query result collection, we use the optimization heuristics to aggressively share data over space.
Each sensor node dynamically selects a route (parent) that is aware of the query space (except tinydb network with uses link quality); in the meanwhile, it tries to take advantage of the broadcast nature of the radio channel to satisfy multiple queries in one message.
Queries are flooded throughout the network from the base station. Accurate set of sensors that have data for the query are not known a prior to the base station & the set of sensor nodes can vary with time.
Let every sensor decide where to propagate to based on its local information about neighbors.
When query is propagated from node x at level i to level i + 1, node x checks if it has the data the query retrieves, and piggybacks this information down.
In the meanwhile, the DAG is formed by having an edge from every node to each of its upper level neighbors (If the network is dense and not all neighbors be maintained, but neighbors that also have query result to transmit).
If the data at node x does not satisfy any query, x switches into sleep mode and will wake up after a predefined time.
When it wakes up, if it finds that its current data satisfies a query, it sends a one-hop broadcast message so that its lower level neighbors would consider the node as an option to relay its data.