Dipartimento di
Elettronica, Informazione e
Bioingegneria
An Experience on Empirical
Research about RDF Stream
Processing
...
Dipartimento di Elettronica, Informazione
e Bioingegneria
RDF Stream Processing in a nutshell
 Continuous queries over RD...
Dipartimento di Elettronica, Informazione
e Bioingegneria
The CQL model for RSPs
 Transform a set of mappings in another ...
Dipartimento di Elettronica, Informazione
e Bioingegneria
R2R operator
S2R - Time-based sliding window
S3
S4 S5
S6
S7
S8
S...
Dipartimento di Elettronica, Informazione
e Bioingegneria
Implementations (oversimplified!)
 C-SPARQL
– RDF Store + Strea...
Dipartimento di Elettronica, Informazione
e Bioingegneria
Implementations (oversimplified!)
 C-SPARQL
– RDF Store + Strea...
Dipartimento di Elettronica, Informazione
e Bioingegneria
Implementations (oversimplified!)
 C-SPARQL
– RDF Store + Strea...
Dipartimento di Elettronica, Informazione
e Bioingegneria
Same inputs, different outputs…
 And the continuous
query:
– Wh...
Dipartimento di Elettronica, Informazione
e Bioingegneria
The first hypothesis
 All the three systems show similar behavi...
Dipartimento di Elettronica, Informazione
e Bioingegneria
The first hypothesis
 HP1: it is possible to have a unique corr...
Dipartimento di Elettronica, Informazione
e Bioingegneria
The experiment
 We work on the difference between the time
inst...
Dipartimento di Elettronica, Informazione
e Bioingegneria
Observation and explanation
 As result, for each system
– We id...
Dipartimento di Elettronica, Informazione
e Bioingegneria
Some consideration on the experiment
 Comparison:
– We ran the ...
Dipartimento di Elettronica, Informazione
e Bioingegneria
Something more on repeatability…
 We made some assumptions on t...
Dipartimento di Elettronica, Informazione
e Bioingegneria
 As “side effect” of the first experiment, we
discovered that r...
Dipartimento di Elettronica, Informazione
e Bioingegneria
R2R operator
The SECRET framework
S3
S4 S5
S6
S7
S8
S9 S10
S11
S...
Dipartimento di Elettronica, Informazione
e Bioingegneria
SECRET and RSPs
 HP2: given an input stream, a query, the value...
Dipartimento di Elettronica, Informazione
e Bioingegneria
Observation and analysis
 We prepared a set of seven
queries (t...
Dipartimento di Elettronica, Informazione
e Bioingegneria
Observation and analysis
 We investigated the observations wher...
Dipartimento di Elettronica, Informazione
e Bioingegneria
CSR-bench
 The main outcome of our experience is CSR-bench, an
...
Dipartimento di Elettronica, Informazione
e Bioingegneria
References
 Daniele Dell'Aglio, Marco Balduini, Emanuele Della ...
Dipartimento di Elettronica, Informazione
e Bioingegneria
Thank you! Questions?
An Experience on Empirical Research about
...
Upcoming SlideShare
Loading in...5
×

An experience on empirical research about rdf stream

199

Published on

The invited talk I gave at the EMPIRICAL 2014 workshop at the ESWC 2014

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
199
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

An experience on empirical research about rdf stream

  1. 1. Dipartimento di Elettronica, Informazione e Bioingegneria An Experience on Empirical Research about RDF Stream Processing Daniele Dell’Aglio – daniele.dellaglio@polimi.it Joint work with: Jean-Paul Calbimonte, Marco Balduini, Oscar Corcho and Emanuele Della Valle
  2. 2. Dipartimento di Elettronica, Informazione e Bioingegneria RDF Stream Processing in a nutshell  Continuous queries over RDF streams - infinite sequences of time-stamped RDF statements (RDF streams)  Bring together DSMS/CEP and Semantic Web research fields  Several prototypes – with similar models – are available today  Trend on evaluation and comparison of the existing systems 26 May 2014 - EMPIRICAL@ESWC2014 DanieleDell'Aglio-ExperimentalresearchaboutRSPs 2
  3. 3. Dipartimento di Elettronica, Informazione e Bioingegneria The CQL model for RSPs  Transform a set of mappings in another set of mappings  SPARQL 1.0/1.1 queries  Each set of mapping produced by the R2R operator is transformed and appended to the output stream  Operators: RStream, DStream, IStream  Converts the infinite stream of RDF elements in a finite set of mappings  The window operators: time-based, tuple-based, … S2R operator R2R operator R2S operator Input stream Output stream DanieleDell'Aglio-ExperimentalresearchaboutRSPs 3 26 May 2014 - EMPIRICAL@ESWC2014
  4. 4. Dipartimento di Elettronica, Informazione e Bioingegneria R2R operator S2R - Time-based sliding window S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S S1 S2 W(ω,β) β ω t widthslide DanieleDell'Aglio-ExperimentalresearchaboutRSPs 4 26 May 2014 - EMPIRICAL@ESWC2014
  5. 5. Dipartimento di Elettronica, Informazione e Bioingegneria Implementations (oversimplified!)  C-SPARQL – RDF Store + Stream processor   RDF Store Stream processor Continuous query continuous results translator DanieleDell'Aglio-ExperimentalresearchaboutRSPs 5 26 May 2014 - EMPIRICAL@ESWC2014
  6. 6. Dipartimento di Elettronica, Informazione e Bioingegneria Implementations (oversimplified!)  C-SPARQL – RDF Store + Stream processor  CQELS: – Implemented from scratch. Focus on performance  RDF Store Stream processor Continuous query continuous results Native RSP Continuous query continuous results translator DanieleDell'Aglio-ExperimentalresearchaboutRSPs 5 26 May 2014 - EMPIRICAL@ESWC2014
  7. 7. Dipartimento di Elettronica, Informazione e Bioingegneria Implementations (oversimplified!)  C-SPARQL – RDF Store + Stream processor  CQELS: – Implemented from scratch. Focus on performance  SPARQLstream: – Ontology-based stream query answering RDF Store Stream processor Continuous query continuous results Native RSP Continuous query continuous results translator DSMS/CEP Continuous query continuous results rewriter R2RML mappings DanieleDell'Aglio-ExperimentalresearchaboutRSPs 5 26 May 2014 - EMPIRICAL@ESWC2014
  8. 8. Dipartimento di Elettronica, Informazione e Bioingegneria Same inputs, different outputs…  And the continuous query: – Where are Alice and Bob, when they are together? – With a tumbling window W(ω=β=5) Execution 1° answer 2° answer 1 :hall [6] :kitchen [11] 2 :hall [5] :kitchen [10] 3 :hall [6] :kitchen [11] 4 - [7] - [12] S1 S2 S3 S4S t3 6 91 :alice :isIn :hall :bob :isIn :hall :alice :isIn :kitchen :bob :isIn :kitchen width slide  After 4 executions:  Let’s consider the following stream: DanieleDell'Aglio-ExperimentalresearchaboutRSPs 8 26 May 2014 - EMPIRICAL@ESWC2014
  9. 9. Dipartimento di Elettronica, Informazione e Bioingegneria The first hypothesis  All the three systems show similar behaviours  Intuition: there are one or more parameters that are not taken into account by the model  As consequence, the implementations can output different correct answers DanieleDell'Aglio-ExperimentalresearchaboutRSPs 9 26 May 2014 - EMPIRICAL@ESWC2014
  10. 10. Dipartimento di Elettronica, Informazione e Bioingegneria The first hypothesis  HP1: it is possible to have a unique correct answer if we can control the time instant on which the sliding window operator starts to work (t0) S1 S2 S3 S4S t3 6 91 :bob :isIn :hall :bob :isIn :kitchen t0=0 :alice :isIn :hall :alice :isIn :kitchen t0=1 t0=2 DanieleDell'Aglio-ExperimentalresearchaboutRSPs 10 26 May 2014 - EMPIRICAL@ESWC2014
  11. 11. Dipartimento di Elettronica, Informazione e Bioingegneria The experiment  We work on the difference between the time instant on which the stream starts (ts) and the query registration time (tq) – At each execution, we check the result – We estimated the delay between tq and t0 tq ts  Black box approach – we work on inputs/outputs – the source code of all the systems RSP DanieleDell'Aglio-ExperimentalresearchaboutRSPs 11 26 May 2014 - EMPIRICAL@ESWC2014 t0
  12. 12. Dipartimento di Elettronica, Informazione e Bioingegneria Observation and explanation  As result, for each system – We identified the value of the t0 parameter – We are able to produce the different results for each t0 value  Is it enough to claim that hypothesis 1 holds? Exec 1° answer 2° answer 1 :hall [6] :kitchen [11] 2 :hall [5] :kitchen [10] 3 :hall [6] :kitchen [11] 4 - [7] - [12] Window 1° answer 2° answer t0=0 :hall [5] :kitchen [10] t0=1 :hall [6] :kitchen [11] t0=2 - [7] - [12] DanieleDell'Aglio-ExperimentalresearchaboutRSPs 12 26 May 2014 - EMPIRICAL@ESWC2014
  13. 13. Dipartimento di Elettronica, Informazione e Bioingegneria Some consideration on the experiment  Comparison: – We ran the experiment multiple times to collect instances and check them  Reproducibility: can other researchers reproduce the experiment? – We released both the code and the data used for the experiment (see http://streamreasoning.org/Benchmarks/)  Repeatability: is the result universally valid? – We changed inputs (streams and queries) and OS/JVM to verify if the hypothesis holds – We repeated the experiment with different implementations (C-SPARQL, CQELS, etc.) DanieleDell'Aglio-ExperimentalresearchaboutRSPs 13 26 May 2014 - EMPIRICAL@ESWC2014
  14. 14. Dipartimento di Elettronica, Informazione e Bioingegneria Something more on repeatability…  We made some assumptions on the setting 26 May 2014 - EMPIRICAL@ESWC2014 DanieleDell'Aglio-ExperimentalresearchaboutRSPs 14 S2R R2R R2SS2R S2R From single to multi window From single to multi stream Reasoning q2 Static knowledge Multiple queries
  15. 15. Dipartimento di Elettronica, Informazione e Bioingegneria  As “side effect” of the first experiment, we discovered that results of different systems are not the same:  Intuition: t0 is not the only parameter our model lacks A more complex problem… Exec 1° answer 2° answer 1 :hall [6] :kitchen [11] 2 :hall [5] :kitchen [10] 3 :hall [6] :kitchen [11] 4 - [7] - [12] Exec 1° answer 2° answer 1 :hall [3] :kitchen [9] 2 No answers 3 :hall [3] :kitchen [9] 4 No answers C-SPARQL CQELS DanieleDell'Aglio-ExperimentalresearchaboutRSPs 15 26 May 2014 - EMPIRICAL@ESWC2014
  16. 16. Dipartimento di Elettronica, Informazione e Bioingegneria R2R operator The SECRET framework S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S S1 S2 W(ω,β) β ω t0: When does the window start? (internal window param) TICK: When are data stream elements added to the window? Triple-based vs graph-based REPORT: When is the window content made available to the R2R operator? Non-empty content, Content-change, Window-close, Periodic t DanieleDell'Aglio-ExperimentalresearchaboutRSPs 16 26 May 2014 - EMPIRICAL@ESWC2014
  17. 17. Dipartimento di Elettronica, Informazione e Bioingegneria SECRET and RSPs  HP2: given an input stream, a query, the value of t0 and description of the RSP w.r.t. SECRET, we can determine the answer that will be provided by the system  To investigate it, we built a software that evaluates in batch the answer and matches it with the RSP one DanieleDell'Aglio-ExperimentalresearchaboutRSPs 17 26 May 2014 - EMPIRICAL@ESWC2014
  18. 18. Dipartimento di Elettronica, Informazione e Bioingegneria Observation and analysis  We prepared a set of seven queries (to stress different part of the sliding window)  We run each query multiple times  Most of the times, we can foresee the answer that will be provided CQELS C-SPARQL SPARQLstream Q1 Q2 Q3 Q4 Q5 Q6 Q7 DanieleDell'Aglio-ExperimentalresearchaboutRSPs 18 26 May 2014 - EMPIRICAL@ESWC2014
  19. 19. Dipartimento di Elettronica, Informazione e Bioingegneria Observation and analysis  We investigated the observations where there is not a match, and we discovered that they were errors in the implementations, such as: – Initialization – Slide parameter – Window contents – Internal timestamp management  Conclusion: HP2 seems to be valid in the considered setting DanieleDell'Aglio-ExperimentalresearchaboutRSPs 19 26 May 2014 - EMPIRICAL@ESWC2014
  20. 20. Dipartimento di Elettronica, Informazione e Bioingegneria CSR-bench  The main outcome of our experience is CSR-bench, an extension of the CSR benchmark – More info at http://www.w3.org/wiki/CSRBench  Two main components: – A common model for the RDF stream processor operational semantics – An oracle (an automatic correctness validator), available at https://github.com/dellaglio/csrbench- oracle – A test suite DanieleDell'Aglio-ExperimentalresearchaboutRSPs 20 26 May 2014 - EMPIRICAL@ESWC2014
  21. 21. Dipartimento di Elettronica, Informazione e Bioingegneria References  Daniele Dell'Aglio, Marco Balduini, Emanuele Della Valle. On the need to include functional testing in RDF stream engine benchmarks. 1st International Workshop on Benchmarking RDF Systems (BeRSys2013)  Daniele Dell'Aglio, Jean-Paul Calbimonte, Marco Balduini, Óscar Corcho, Emanuele Della Valle: On Correctness in RDF Stream Processor Benchmarking. International Semantic Web Conference (2) 2013: 326-342  Barbieri, D.F., Braga, D., Ceri, S., Della Valle, E., Grossniklaus, M.: C- SPARQL: A continuous query language for RDF data streams. IJSC 4(1) (2010) 3–25  Calbimonte, J.P., Jeung, H., Corcho, O., Aberer, K.: Enabling Query Technologies for the Semantic Sensor Web. IJSWIS 8(1) (2012) 43–63  Le-Phuoc, D., Dao-Tran, M., Xavier Parreira, J., Hauswirth, M.: A native and adaptive approach for unified processing of linked streams and linked data. In: ISWC. (2011) 370–388 DanieleDell'Aglio-ExperimentalresearchaboutRSPs 21 26 May 2014 - EMPIRICAL@ESWC2014
  22. 22. Dipartimento di Elettronica, Informazione e Bioingegneria Thank you! Questions? An Experience on Empirical Research about RDF Stream Processing Daniele Dell’Aglio (DEIB, Politecnico di Milano) daniele.dellaglio@polimi.it DanieleDell'Aglio-ExperimentalresearchaboutRSPs 22 26 May 2014 - EMPIRICAL@ESWC2014
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×