Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Dipartimento di
Elettronica, Informazione e
Bioingegneria
An Experience on Empirical
Research about RDF Stream
Processing
...
Dipartimento di Elettronica, Informazione
e Bioingegneria
RDF Stream Processing in a nutshell
 Continuous queries over RD...
Dipartimento di Elettronica, Informazione
e Bioingegneria
The CQL model for RSPs
 Transform a set of mappings in another ...
Dipartimento di Elettronica, Informazione
e Bioingegneria
R2R operator
S2R - Time-based sliding window
S3
S4 S5
S6
S7
S8
S...
Dipartimento di Elettronica, Informazione
e Bioingegneria
Implementations (oversimplified!)
 C-SPARQL
– RDF Store + Strea...
Dipartimento di Elettronica, Informazione
e Bioingegneria
Implementations (oversimplified!)
 C-SPARQL
– RDF Store + Strea...
Dipartimento di Elettronica, Informazione
e Bioingegneria
Implementations (oversimplified!)
 C-SPARQL
– RDF Store + Strea...
Dipartimento di Elettronica, Informazione
e Bioingegneria
Same inputs, different outputs…
 And the continuous
query:
– Wh...
Dipartimento di Elettronica, Informazione
e Bioingegneria
The first hypothesis
 All the three systems show similar behavi...
Dipartimento di Elettronica, Informazione
e Bioingegneria
The first hypothesis
 HP1: it is possible to have a unique corr...
Dipartimento di Elettronica, Informazione
e Bioingegneria
The experiment
 We work on the difference between the time
inst...
Dipartimento di Elettronica, Informazione
e Bioingegneria
Observation and explanation
 As result, for each system
– We id...
Dipartimento di Elettronica, Informazione
e Bioingegneria
Some consideration on the experiment
 Comparison:
– We ran the ...
Dipartimento di Elettronica, Informazione
e Bioingegneria
Something more on repeatability…
 We made some assumptions on t...
Dipartimento di Elettronica, Informazione
e Bioingegneria
 As “side effect” of the first experiment, we
discovered that r...
Dipartimento di Elettronica, Informazione
e Bioingegneria
R2R operator
The SECRET framework
S3
S4 S5
S6
S7
S8
S9 S10
S11
S...
Dipartimento di Elettronica, Informazione
e Bioingegneria
SECRET and RSPs
 HP2: given an input stream, a query, the value...
Dipartimento di Elettronica, Informazione
e Bioingegneria
Observation and analysis
 We prepared a set of seven
queries (t...
Dipartimento di Elettronica, Informazione
e Bioingegneria
Observation and analysis
 We investigated the observations wher...
Dipartimento di Elettronica, Informazione
e Bioingegneria
CSR-bench
 The main outcome of our experience is CSR-bench, an
...
Dipartimento di Elettronica, Informazione
e Bioingegneria
References
 Daniele Dell'Aglio, Marco Balduini, Emanuele Della ...
Dipartimento di Elettronica, Informazione
e Bioingegneria
Thank you! Questions?
An Experience on Empirical Research about
...
Upcoming SlideShare
Loading in …5
×

An experience on empirical research about rdf stream

542 views

Published on

The invited talk I gave at the EMPIRICAL 2014 workshop at the ESWC 2014

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

An experience on empirical research about rdf stream

  1. 1. Dipartimento di Elettronica, Informazione e Bioingegneria An Experience on Empirical Research about RDF Stream Processing Daniele Dell’Aglio – daniele.dellaglio@polimi.it Joint work with: Jean-Paul Calbimonte, Marco Balduini, Oscar Corcho and Emanuele Della Valle
  2. 2. Dipartimento di Elettronica, Informazione e Bioingegneria RDF Stream Processing in a nutshell  Continuous queries over RDF streams - infinite sequences of time-stamped RDF statements (RDF streams)  Bring together DSMS/CEP and Semantic Web research fields  Several prototypes – with similar models – are available today  Trend on evaluation and comparison of the existing systems 26 May 2014 - EMPIRICAL@ESWC2014 DanieleDell'Aglio-ExperimentalresearchaboutRSPs 2
  3. 3. Dipartimento di Elettronica, Informazione e Bioingegneria The CQL model for RSPs  Transform a set of mappings in another set of mappings  SPARQL 1.0/1.1 queries  Each set of mapping produced by the R2R operator is transformed and appended to the output stream  Operators: RStream, DStream, IStream  Converts the infinite stream of RDF elements in a finite set of mappings  The window operators: time-based, tuple-based, … S2R operator R2R operator R2S operator Input stream Output stream DanieleDell'Aglio-ExperimentalresearchaboutRSPs 3 26 May 2014 - EMPIRICAL@ESWC2014
  4. 4. Dipartimento di Elettronica, Informazione e Bioingegneria R2R operator S2R - Time-based sliding window S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S S1 S2 W(ω,β) β ω t widthslide DanieleDell'Aglio-ExperimentalresearchaboutRSPs 4 26 May 2014 - EMPIRICAL@ESWC2014
  5. 5. Dipartimento di Elettronica, Informazione e Bioingegneria Implementations (oversimplified!)  C-SPARQL – RDF Store + Stream processor   RDF Store Stream processor Continuous query continuous results translator DanieleDell'Aglio-ExperimentalresearchaboutRSPs 5 26 May 2014 - EMPIRICAL@ESWC2014
  6. 6. Dipartimento di Elettronica, Informazione e Bioingegneria Implementations (oversimplified!)  C-SPARQL – RDF Store + Stream processor  CQELS: – Implemented from scratch. Focus on performance  RDF Store Stream processor Continuous query continuous results Native RSP Continuous query continuous results translator DanieleDell'Aglio-ExperimentalresearchaboutRSPs 5 26 May 2014 - EMPIRICAL@ESWC2014
  7. 7. Dipartimento di Elettronica, Informazione e Bioingegneria Implementations (oversimplified!)  C-SPARQL – RDF Store + Stream processor  CQELS: – Implemented from scratch. Focus on performance  SPARQLstream: – Ontology-based stream query answering RDF Store Stream processor Continuous query continuous results Native RSP Continuous query continuous results translator DSMS/CEP Continuous query continuous results rewriter R2RML mappings DanieleDell'Aglio-ExperimentalresearchaboutRSPs 5 26 May 2014 - EMPIRICAL@ESWC2014
  8. 8. Dipartimento di Elettronica, Informazione e Bioingegneria Same inputs, different outputs…  And the continuous query: – Where are Alice and Bob, when they are together? – With a tumbling window W(ω=β=5) Execution 1° answer 2° answer 1 :hall [6] :kitchen [11] 2 :hall [5] :kitchen [10] 3 :hall [6] :kitchen [11] 4 - [7] - [12] S1 S2 S3 S4S t3 6 91 :alice :isIn :hall :bob :isIn :hall :alice :isIn :kitchen :bob :isIn :kitchen width slide  After 4 executions:  Let’s consider the following stream: DanieleDell'Aglio-ExperimentalresearchaboutRSPs 8 26 May 2014 - EMPIRICAL@ESWC2014
  9. 9. Dipartimento di Elettronica, Informazione e Bioingegneria The first hypothesis  All the three systems show similar behaviours  Intuition: there are one or more parameters that are not taken into account by the model  As consequence, the implementations can output different correct answers DanieleDell'Aglio-ExperimentalresearchaboutRSPs 9 26 May 2014 - EMPIRICAL@ESWC2014
  10. 10. Dipartimento di Elettronica, Informazione e Bioingegneria The first hypothesis  HP1: it is possible to have a unique correct answer if we can control the time instant on which the sliding window operator starts to work (t0) S1 S2 S3 S4S t3 6 91 :bob :isIn :hall :bob :isIn :kitchen t0=0 :alice :isIn :hall :alice :isIn :kitchen t0=1 t0=2 DanieleDell'Aglio-ExperimentalresearchaboutRSPs 10 26 May 2014 - EMPIRICAL@ESWC2014
  11. 11. Dipartimento di Elettronica, Informazione e Bioingegneria The experiment  We work on the difference between the time instant on which the stream starts (ts) and the query registration time (tq) – At each execution, we check the result – We estimated the delay between tq and t0 tq ts  Black box approach – we work on inputs/outputs – the source code of all the systems RSP DanieleDell'Aglio-ExperimentalresearchaboutRSPs 11 26 May 2014 - EMPIRICAL@ESWC2014 t0
  12. 12. Dipartimento di Elettronica, Informazione e Bioingegneria Observation and explanation  As result, for each system – We identified the value of the t0 parameter – We are able to produce the different results for each t0 value  Is it enough to claim that hypothesis 1 holds? Exec 1° answer 2° answer 1 :hall [6] :kitchen [11] 2 :hall [5] :kitchen [10] 3 :hall [6] :kitchen [11] 4 - [7] - [12] Window 1° answer 2° answer t0=0 :hall [5] :kitchen [10] t0=1 :hall [6] :kitchen [11] t0=2 - [7] - [12] DanieleDell'Aglio-ExperimentalresearchaboutRSPs 12 26 May 2014 - EMPIRICAL@ESWC2014
  13. 13. Dipartimento di Elettronica, Informazione e Bioingegneria Some consideration on the experiment  Comparison: – We ran the experiment multiple times to collect instances and check them  Reproducibility: can other researchers reproduce the experiment? – We released both the code and the data used for the experiment (see http://streamreasoning.org/Benchmarks/)  Repeatability: is the result universally valid? – We changed inputs (streams and queries) and OS/JVM to verify if the hypothesis holds – We repeated the experiment with different implementations (C-SPARQL, CQELS, etc.) DanieleDell'Aglio-ExperimentalresearchaboutRSPs 13 26 May 2014 - EMPIRICAL@ESWC2014
  14. 14. Dipartimento di Elettronica, Informazione e Bioingegneria Something more on repeatability…  We made some assumptions on the setting 26 May 2014 - EMPIRICAL@ESWC2014 DanieleDell'Aglio-ExperimentalresearchaboutRSPs 14 S2R R2R R2SS2R S2R From single to multi window From single to multi stream Reasoning q2 Static knowledge Multiple queries
  15. 15. Dipartimento di Elettronica, Informazione e Bioingegneria  As “side effect” of the first experiment, we discovered that results of different systems are not the same:  Intuition: t0 is not the only parameter our model lacks A more complex problem… Exec 1° answer 2° answer 1 :hall [6] :kitchen [11] 2 :hall [5] :kitchen [10] 3 :hall [6] :kitchen [11] 4 - [7] - [12] Exec 1° answer 2° answer 1 :hall [3] :kitchen [9] 2 No answers 3 :hall [3] :kitchen [9] 4 No answers C-SPARQL CQELS DanieleDell'Aglio-ExperimentalresearchaboutRSPs 15 26 May 2014 - EMPIRICAL@ESWC2014
  16. 16. Dipartimento di Elettronica, Informazione e Bioingegneria R2R operator The SECRET framework S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S S1 S2 W(ω,β) β ω t0: When does the window start? (internal window param) TICK: When are data stream elements added to the window? Triple-based vs graph-based REPORT: When is the window content made available to the R2R operator? Non-empty content, Content-change, Window-close, Periodic t DanieleDell'Aglio-ExperimentalresearchaboutRSPs 16 26 May 2014 - EMPIRICAL@ESWC2014
  17. 17. Dipartimento di Elettronica, Informazione e Bioingegneria SECRET and RSPs  HP2: given an input stream, a query, the value of t0 and description of the RSP w.r.t. SECRET, we can determine the answer that will be provided by the system  To investigate it, we built a software that evaluates in batch the answer and matches it with the RSP one DanieleDell'Aglio-ExperimentalresearchaboutRSPs 17 26 May 2014 - EMPIRICAL@ESWC2014
  18. 18. Dipartimento di Elettronica, Informazione e Bioingegneria Observation and analysis  We prepared a set of seven queries (to stress different part of the sliding window)  We run each query multiple times  Most of the times, we can foresee the answer that will be provided CQELS C-SPARQL SPARQLstream Q1 Q2 Q3 Q4 Q5 Q6 Q7 DanieleDell'Aglio-ExperimentalresearchaboutRSPs 18 26 May 2014 - EMPIRICAL@ESWC2014
  19. 19. Dipartimento di Elettronica, Informazione e Bioingegneria Observation and analysis  We investigated the observations where there is not a match, and we discovered that they were errors in the implementations, such as: – Initialization – Slide parameter – Window contents – Internal timestamp management  Conclusion: HP2 seems to be valid in the considered setting DanieleDell'Aglio-ExperimentalresearchaboutRSPs 19 26 May 2014 - EMPIRICAL@ESWC2014
  20. 20. Dipartimento di Elettronica, Informazione e Bioingegneria CSR-bench  The main outcome of our experience is CSR-bench, an extension of the CSR benchmark – More info at http://www.w3.org/wiki/CSRBench  Two main components: – A common model for the RDF stream processor operational semantics – An oracle (an automatic correctness validator), available at https://github.com/dellaglio/csrbench- oracle – A test suite DanieleDell'Aglio-ExperimentalresearchaboutRSPs 20 26 May 2014 - EMPIRICAL@ESWC2014
  21. 21. Dipartimento di Elettronica, Informazione e Bioingegneria References  Daniele Dell'Aglio, Marco Balduini, Emanuele Della Valle. On the need to include functional testing in RDF stream engine benchmarks. 1st International Workshop on Benchmarking RDF Systems (BeRSys2013)  Daniele Dell'Aglio, Jean-Paul Calbimonte, Marco Balduini, Óscar Corcho, Emanuele Della Valle: On Correctness in RDF Stream Processor Benchmarking. International Semantic Web Conference (2) 2013: 326-342  Barbieri, D.F., Braga, D., Ceri, S., Della Valle, E., Grossniklaus, M.: C- SPARQL: A continuous query language for RDF data streams. IJSC 4(1) (2010) 3–25  Calbimonte, J.P., Jeung, H., Corcho, O., Aberer, K.: Enabling Query Technologies for the Semantic Sensor Web. IJSWIS 8(1) (2012) 43–63  Le-Phuoc, D., Dao-Tran, M., Xavier Parreira, J., Hauswirth, M.: A native and adaptive approach for unified processing of linked streams and linked data. In: ISWC. (2011) 370–388 DanieleDell'Aglio-ExperimentalresearchaboutRSPs 21 26 May 2014 - EMPIRICAL@ESWC2014
  22. 22. Dipartimento di Elettronica, Informazione e Bioingegneria Thank you! Questions? An Experience on Empirical Research about RDF Stream Processing Daniele Dell’Aglio (DEIB, Politecnico di Milano) daniele.dellaglio@polimi.it DanieleDell'Aglio-ExperimentalresearchaboutRSPs 22 26 May 2014 - EMPIRICAL@ESWC2014

×