Continuous Self-Updating Query Results over Dynamic Linked Data
Continuously Self-Updating Query
Results over Dynamic Linked Data
Ruben Taelman - @rubensworks
iMinds - Ghent University
Dynamic Linked Data
E.g. Thermometer measures every minute:
“19,05°C” - 30-05-2016 11:00
“19,06°C” - 30-05-2016 11:01
“19,11°C” - 30-05-2016 11:02
“19,08°C” - 30-05-2016 11:03
Typically exposed as an RDF stream = stream of <RDF triple, timestamp>
Querying continous data
Clients send queries to server: e.g. What is the current temperature?
Server continuously evaluates the queries
→ Server does all of the work
Cause of low public endpoint availability!
½ have availability of < 95% (Buil-Aranda 2013)
→ Clients just wait for results
What if we moved continuous query evaluation to the client?
→ to lower server load
How to publish of dynamic data, to make it queryable together with static data
at a low server cost?
How can we efficiently store dynamic data and allow efficient transfer to clients?
What kind of server interface do we need to enable client-side query evaluation over
both static and dynamic data?
1. Our storage solution can store new data in linear time with respect to the
amount of new data.
2. Our storage solution can retrieve data by time or triple values in linear time with
respect to the amount of retrieved data.
3. The server cost for our solution is lower than the alternatives.
4. Data transfer is the main factor influencing query execution time.
Moving continuous query evaluation to the client
Triple Pattern Fragments does this for static data!
Triple pattern fragments (TPF) (Verborgh 2016):
Servers can only respond to triple pattern queries
Clients need to evaluate queries locally
→ Lowers server complexity
How I will do this for dynamic data
Storage Transmission Query evaluation
How do we efficiently store / retrieve dynamic data? (Indexing)
It depends on the use cases:
Querying on a certain time (Indexing by time)
What was the temperature in Ghent yesterday?
Querying for a certain time (Indexing by property)
When was it 20°C in Ghent?
Can we / Do we have to combine these indexing techniques?
Moving query evaluation to the client requires more data to be transfered
→ Increases bandwidth usage
→ Slows down query evaluation
→ Limits query frequency
Compression within and between versions
Higher data selectivity
Scope: Data with a predictable valid time
Some thermometers measure /min → data will not change during that minute.
Otherwise we need to poll or have a persistent server connection
Annotate data with their valid time:
Thermometer_1 : 10°C (10:00 - 10:01)
Thermometer_1 : 20°C (10:01 - 10:02)
Thermometer_1 : 20°C (10:02 - 10:03)
→ Clients can fetch this data as if it was static data
Evaluation of the three parts
Insertion, lookup, size
Latency, bandwidth, cacheability