Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Continuous Self-Updating Query Results over Dynamic Linked Data

491 views

Published on

Presentation for the PhD Symposium workshop @ESWC 2016

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

Continuous Self-Updating Query Results over Dynamic Linked Data

  1. 1. Continuously Self-Updating Query Results over Dynamic Linked Data Ruben Taelman - @rubensworks iMinds - Ghent University
  2. 2. Dynamic Linked Data E.g. Thermometer measures every minute: “19,05°C” - 30-05-2016 11:00 “19,06°C” - 30-05-2016 11:01 “19,11°C” - 30-05-2016 11:02 “19,08°C” - 30-05-2016 11:03 … Typically exposed as an RDF stream = stream of <RDF triple, timestamp>
  3. 3. Querying continous data Clients send queries to server: e.g. What is the current temperature? Server continuously evaluates the queries → Server does all of the work Cause of low public endpoint availability! ½ have availability of < 95% (Buil-Aranda 2013) → Clients just wait for results
  4. 4. What if we moved continuous query evaluation to the client? → to lower server load
  5. 5. Overview Research questions Research approach Evaluation plan Preliminary results
  6. 6. Overview Research questions Research approach Evaluation plan Preliminary results
  7. 7. Research questions How to publish of dynamic data, to make it queryable together with static data at a low server cost? How can we efficiently store dynamic data and allow efficient transfer to clients? What kind of server interface do we need to enable client-side query evaluation over both static and dynamic data?
  8. 8. Hypotheses 1. Our storage solution can store new data in linear time with respect to the amount of new data. 2. Our storage solution can retrieve data by time or triple values in linear time with respect to the amount of retrieved data. 3. The server cost for our solution is lower than the alternatives. 4. Data transfer is the main factor influencing query execution time.
  9. 9. Overview Research questions Research approach Evaluation plan Preliminary results
  10. 10. Moving continuous query evaluation to the client
  11. 11. Triple Pattern Fragments does this for static data! Triple pattern fragments (TPF) (Verborgh 2016): Servers can only respond to triple pattern queries Clients need to evaluate queries locally → Lowers server complexity
  12. 12. How I will do this for dynamic data Storage Transmission Query evaluation
  13. 13. Storage How do we efficiently store / retrieve dynamic data? (Indexing) It depends on the use cases: Querying on a certain time (Indexing by time) What was the temperature in Ghent yesterday? Querying for a certain time (Indexing by property) When was it 20°C in Ghent? Can we / Do we have to combine these indexing techniques?
  14. 14. Transmission Disadvantage: Moving query evaluation to the client requires more data to be transfered → Increases bandwidth usage → Slows down query evaluation → Limits query frequency Possible solutions: Compression within and between versions Caching Higher data selectivity
  15. 15. Query Evaluation Scope: Data with a predictable valid time Some thermometers measure /min → data will not change during that minute. Otherwise we need to poll or have a persistent server connection Annotate data with their valid time: Thermometer_1 : 10°C (10:00 - 10:01) Thermometer_1 : 20°C (10:01 - 10:02) Thermometer_1 : 20°C (10:02 - 10:03) → Clients can fetch this data as if it was static data
  16. 16. Overview Research questions Research approach Evaluation plan Preliminary results
  17. 17. Evaluation of the three parts Storage Transmission Query evaluation Insertion, lookup, size Latency, bandwidth, cacheability Result latency
  18. 18. Combined evaluation Realistic datasets/datastreams and queries Compare with: Server-side: C-SPARQL (Barbieri 2012) CQELS (Le-Phuoc 2011) Client-side: Ztreamy (Fisteus 2014) Compare by: latency completeness server load client load scalability → LSBench (Le-Phuoc 2012), SRBench (Zhang 2012), CityBench (Ali 2015), ...
  19. 19. Overview Research questions Research approach Evaluation plan Preliminary results
  20. 20. Preliminary scalability test Query Streamer prototype (Taelman 2016), based on TPF Test server load for increasing #clients Compared with C-SPARQL, CQELS
  21. 21. Query Streamer moves load from server to client Server scalability Client load
  22. 22. Overview Research questions Research approach Evaluation plan Preliminary results

×