Ruben Taelman - @rubensworks
iMinds - Ghent University
Continuously Updating Query Results
over Real-Time Linked Data
Dynamic Linked Data
E.g. Thermometer measures every minute:
“19,05°C” - 30-05-2016 11:00
“19,06°C” - 30-05-2016 11:01
“19,11°C” - 30-05-2016 11:02
“19,08°C” - 30-05-2016 11:03
…
Typically exposed as an RDF stream = stream of <RDF triple, timestamp>
Querying continous data
Clients send queries to server: e.g. What is the current temperature?
Server continuously evaluates the queries
→ Server does all of the work
Cause of low public endpoint availability!
½ have availability of < 95% (Buil-Aranda 2013)
→ Clients just wait for results
What if we moved continuous query evaluation to the client?
→ to lower server load
Triple Pattern Fragments does this for static data!
Triple pattern fragments (TPF) (Verborgh 2016):
Servers can only respond to triple pattern queries
Clients need to evaluate queries locally
→ Lowers server complexity
Can we do the same for dynamic data?
Overview
Dynamic data representation
Query streamer engine
Evaluation
Overview
Dynamic data representation
Query streamer engine
Evaluation
Dynamic data representation
Expose dynamic data through the TPF interface
→ Represent dynamic data in RDF
We annotate dynamic data with the time at which they are valid
→ Client can derive the time at which data can change!
But how do we annotate data/triples with time?
Annotation methods
Reification
Singleton properties (Nguyen 2014)
Graphs
Implicit graphs
Outdated
Instantiate predicates
Define fourth element in quad
TPF makes triples (de)referencable
Time labeling types
Time interval
Expiration time
Start- and endtime of validity
Good for maintaining a history of elements
Endtime of validity
When only the latest version is required
Dynamic data example
radio:bbc-radio-1 m:plays radio:jauz-netsky-higher.
GRAPH _:g1 {
radio:bbc-radio-1 m:plays radio:jauz-netsky-higher.
}
_:g1 tmp:interval _:interval_1.
_:interval_1 tmp:initial "2016-05-30T09:15:00"^^xsd:dateTime.
_:interval_1 tmp:final "2016-05-30T09:20:00"^^xsd:dateTime.
Graph-annotation: [ 9:15, 9:20 ]
Overview
Dynamic data representation
Query engine
Evaluation
Query streamer engine
Overview
Dynamic data representation
Query streamer engine
Evaluation
Measure query execution times for query duration
Query: “All trains with their delay in station X within the next hour”
Frequency: 10 seconds
Clients: 1
Engine: Query streamer
Annotation methods: singleton property, graph, implicit graph
Time labeling types: time interval, expiration time
Evaluating annotation methods
Evaluating annotation methods
Time interval Expiration time
Evaluating scalability
Measure server CPU usage for increasing # clients
Query: “All trains with their delay in station X within the next hour”
Frequency: 10 seconds
Clients: 1 → 200
Engines: Query streamer, C-SPARQL (Barbieri 2012) and
CQELS (Le-Phuoc 2011)
Annotation method: graph
Time labeling types: expiration time
Query Streamer has better scalability
Query Streamer moves load from server to client
Overview
Dynamic data representation
Annotate dynamic data with time
Query streamer engine
Client-side query engine
Dynamic data at TPF server
Evaluation
Annotation methods
Scalability
Conclusions
Further evaluation: Different query types, …?
Solve efficiency-problem time intervals?
Promising approach for improved scalability

Continuously Updating Query Results over Real-Time Linked Data

  • 1.
    Ruben Taelman -@rubensworks iMinds - Ghent University Continuously Updating Query Results over Real-Time Linked Data
  • 2.
    Dynamic Linked Data E.g.Thermometer measures every minute: “19,05°C” - 30-05-2016 11:00 “19,06°C” - 30-05-2016 11:01 “19,11°C” - 30-05-2016 11:02 “19,08°C” - 30-05-2016 11:03 … Typically exposed as an RDF stream = stream of <RDF triple, timestamp>
  • 3.
    Querying continous data Clientssend queries to server: e.g. What is the current temperature? Server continuously evaluates the queries → Server does all of the work Cause of low public endpoint availability! ½ have availability of < 95% (Buil-Aranda 2013) → Clients just wait for results
  • 4.
    What if wemoved continuous query evaluation to the client? → to lower server load
  • 5.
    Triple Pattern Fragmentsdoes this for static data! Triple pattern fragments (TPF) (Verborgh 2016): Servers can only respond to triple pattern queries Clients need to evaluate queries locally → Lowers server complexity Can we do the same for dynamic data?
  • 6.
  • 7.
  • 8.
    Dynamic data representation Exposedynamic data through the TPF interface → Represent dynamic data in RDF We annotate dynamic data with the time at which they are valid → Client can derive the time at which data can change! But how do we annotate data/triples with time?
  • 9.
    Annotation methods Reification Singleton properties(Nguyen 2014) Graphs Implicit graphs Outdated Instantiate predicates Define fourth element in quad TPF makes triples (de)referencable
  • 10.
    Time labeling types Timeinterval Expiration time Start- and endtime of validity Good for maintaining a history of elements Endtime of validity When only the latest version is required
  • 11.
    Dynamic data example radio:bbc-radio-1m:plays radio:jauz-netsky-higher. GRAPH _:g1 { radio:bbc-radio-1 m:plays radio:jauz-netsky-higher. } _:g1 tmp:interval _:interval_1. _:interval_1 tmp:initial "2016-05-30T09:15:00"^^xsd:dateTime. _:interval_1 tmp:final "2016-05-30T09:20:00"^^xsd:dateTime. Graph-annotation: [ 9:15, 9:20 ]
  • 12.
  • 13.
  • 14.
  • 15.
    Measure query executiontimes for query duration Query: “All trains with their delay in station X within the next hour” Frequency: 10 seconds Clients: 1 Engine: Query streamer Annotation methods: singleton property, graph, implicit graph Time labeling types: time interval, expiration time Evaluating annotation methods
  • 16.
    Evaluating annotation methods Timeinterval Expiration time
  • 17.
    Evaluating scalability Measure serverCPU usage for increasing # clients Query: “All trains with their delay in station X within the next hour” Frequency: 10 seconds Clients: 1 → 200 Engines: Query streamer, C-SPARQL (Barbieri 2012) and CQELS (Le-Phuoc 2011) Annotation method: graph Time labeling types: expiration time
  • 18.
    Query Streamer hasbetter scalability
  • 19.
    Query Streamer movesload from server to client
  • 20.
    Overview Dynamic data representation Annotatedynamic data with time Query streamer engine Client-side query engine Dynamic data at TPF server Evaluation Annotation methods Scalability
  • 21.
    Conclusions Further evaluation: Differentquery types, …? Solve efficiency-problem time intervals? Promising approach for improved scalability