On the need to include functional testing in RDF stream engine benchmarks

862 views

Published on

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
862
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

On the need to include functional testing in RDF stream engine benchmarks

  1. 1. On the need to includefunctional testing in RDFstream enginebenchmarksDaniele Dell’Aglio,Marco Balduini, andEmanuele Della Valle1st International Workshop onBenchmarking RDF Systems (BeRSys2013)co-located with ESWC 2013May 26th, 2013Montpellier, France
  2. 2. AgendaBackground onData Stream Management Systems (DSMS)RDF Stream EnginesBenchmarking RDF Stream EngineOperational semantics of RDF Stream EnginesTesting the correctness of continuous queriesresultsConclusionsBeRSys 2013 - May 26, 2013Emanuele Della Valle - http://streamreasoning.org
  3. 3. BackgroundData Stream Management SystemsWhat are data streams?Formally:Data streams are unbounded sequences of time-varying data elementsLess formally:an (almost)“continuous”flow of informationwith the recent information being more relevant as itdescribes the current state of a dynamic systemtimeBeRSys 2013 - May 26, 2013Emanuele Della Valle - http://streamreasoning.org
  4. 4. BackgroundData Stream Management SystemsThe nature of streams requires a paradigmaticchange*from persistent datato be stored and queried on demanda.k.a. one time semanticsto transient datato be consumed on the fly by continuous queriesa.k.a. continuous semantics* This paradigmatic change first arose in DB community BeRSys 2013 - May 26, 2013Emanuele Della Valle - http://streamreasoning.org
  5. 5. BackgroundData Stream Management SystemsContinuous queries registered over streamsthat, in most of the cases, are observed troughwindowsStreams of answerproduced by therelation to streamoperatorsRegisteredContinuous QueryWindow:stream torelationoperatorsinput streamsExpress usingrelation to relationoperatorsBeRSys 2013 - May 26, 2013Emanuele Della Valle - http://streamreasoning.org
  6. 6. BackgroundData Stream Management SystemsTypes of windows (a.k.a., stream to relationoperators)physical: a given number of data elementslogical: a variable number of data elements which occurduring a given time interval (e.g., 1 hour)Sliding: they are progressively advanced ofa given STEP (e.g., 5 minutes)Tumbling: they are advanced of exactly their timeintervalBeRSys 2013 - May 26, 2013Emanuele Della Valle - http://streamreasoning.org
  7. 7. BackgroundRDF Stream EnginesRDF Stream Engines ports DSMS concepts intothe Semantic Web extendingRDF data model with the notion of RDF Stream…<si pi oi> : [τ1]<si+1 pi+1 oi+1> : [τ1+1]…SPARQL to express and process continuous queriesExisting languages/enginesCQELSSPARQLSTREAMC-SPARQLTimestamps arenon-decreasing to allowfor expressingcontemporaneityBeRSys 2013 - May 26, 2013Emanuele Della Valle - http://streamreasoning.org
  8. 8. BackgroundE.g., where C-SPARQL extendsSPARQL
  9. 9. BackgroundE.g., where C-SPARQL extendsSPARQL
  10. 10. BackgroundAn example of C-SPARQL queryWho are the opinion makers? i.e., the users who arelikely to influence the behaviour of other users whofollow themREGISTER STREAM OpinionMakers COMPUTED EVERY 5m ASCONSTRUCT { ?opinionMaker sd:about ?resource }FROM STREAM<http://streamingsocialdata.org/interactions>[RANGE 30m STEP 5m]WHERE {?opinionMaker ?opinion ?resource .?follower sioc:follows ?opinionMaker.?follower ?opinion ?resource.FILTER ( cs:timestamp(?follower) >cs:timestamp(?opinionMaker)&& ?opinion != sd:accesses ) BeRSys 2013 - May 26, 2013
  11. 11. BackgroundAn example of C-SPARQL queryWho are the opinion makers? i.e., the users who arelikely to influence the behaviour of other users whofollow themREGISTER STREAM OpinionMakers COMPUTED EVERY 5m ASCONSTRUCT { ?opinionMaker sd:about ?resource }FROM STREAM<http://streamingsocialdata.org/interactions>[RANGE 30m STEP 5m]WHERE {?opinionMaker ?opinion ?resource .?follower sioc:follows ?opinionMaker.?follower ?opinion ?resource.FILTER ( cs:timestamp(?follower) >cs:timestamp(?opinionMaker)&& ?opinion != sd:accesses )Query registration(for continuousexecution)FROM STREAMclauseWINDOWRDF Stream addedasnew output formatBuiltin toaccesstimestampsAggregatesas in SPARQL1.1BeRSys 2013 - May 26, 2013
  12. 12. BackgroundBenchmarking RDF stream enginesSRBenchDataset: LinkedSensorData (real meteorologicalsensor data)Queries: 17 continuous queries, some requiring RDFSreasoningKPI: feature coverage and correctnessLSBenchDataset: synthetic social network inspired data setQueries: 12 continuous queries involving multiplestream and static knowledgeKPI: input throughput and correctnessNotverifiedVerifiedcomparing thenumber ofresults producedby different
  13. 13. Is verifying correctness hard?Not for SPARQLhttp://www.w3.org/2009/sparql/docs/tests/Queries + expected resultsHowever, it is hard for continuous (SPARQL)queries1 query  multiple correct resultsInput data and query are not enough to determinethe correct resultBeRSys 2013 - May 26, 2013Emanuele Della Valle - http://streamreasoning.org
  14. 14. A simple testTake the motivation scenario of CQELSthere are two connected rooms, r1 and r2;each room has a sensor able to detect the individualsinside, m1 and m2.The stream ST contains the following triples:<:m1 :detectedAt :r1>:[1]<:m2 :detectedAt :r1>:[3]<:m1 :detectedAt :r2>:[12]<:m2 :detectedAt :r2>:[15]S1S2S3S4BeRSys 2013 - May 26, 2013Emanuele Della Valle - http://streamreasoning.org
  15. 15. The query of the simple testWe want to know when the two individuals m1 and m2 are in thesame room using time-based tumbling window of 10 seconds.REGISTER QUERY SimpleTest ASSELECT ?roomFROM STREAM <http://ex.org/ST> [RANGE 10s STEP 10s]WHERE {:m1 :detectedAt ?room .:m2 :detectedAt ?room}BeRSys 2013 - May 26, 2013Emanuele Della Valle - http://streamreasoning.org
  16. 16. All the results you can obtainRunning the test in C-SPARQL, CQELS andSPARQLSTREAM the following results can beobtained.Are they all correct?How can this be?BeRSys 2013 - May 26, 2013Emanuele Della Valle - http://streamreasoning.org
  17. 17. These engines havedifferent operationalsemantics!BeRSys 2013 - May 26, 2013Emanuele Della Valle - http://streamreasoning.org
  18. 18. The devil is in the details!S1 S2 S3 S4W0S1 = <:m1 :detectedAt:r1>:[1]S2 = <:m2 :detectedAt:r1>:[3]S3 = <:m1 :detectedAt:r2>:[12]S4 = <:m2 .detectedAt:r2>:[15]STt3 12 151W1W2W3W4W5W6BeRSys 2013 - May 26, 2013Emanuele Della Valle - http://streamreasoning.org
  19. 19. Can operational semantics of RDFStream Engines be modelled?A model has been proposed to explain thedifferences that appear between different DSMS:SECRETThe results of a DSMS not only depends on the inputand the query, but also on the systemBeRSys 2013 - May 26, 2013Emanuele Della Valle - http://streamreasoning.org
  20. 20. ScopE in the SECRET modelIt is the time range of the active window[topen,tclose)it is determined using the size ω andslideβparameters of the window as written by thequery issuerBeRSys 2013 - May 26, 2013Emanuele Della Valle - http://streamreasoning.orgtapp1 2 3 4 5 6 7tW1W2W3ClosedOpenActiveω=3β= 2
  21. 21. Content in the SECRET modelIt is the subset of the stream included of theactive windowIt is determined usingthe size ω and slide β parameters of the window andt0 the time instant on which the first window starts,W0W1W2W3t3 12 151ω=β=10Different values fort0 BeRSys 2013 - May 26, 2013Emanuele Della Valle - http://streamreasoning.org
  22. 22. Report in the SECRET modelIt defines the conditions under which the windowcontents become visible for further queryevaluation and result reportingIt can take a logical combination of the following:content changewindow closenon-empty contentperiodicBeRSys 2013 - May 26, 2013Emanuele Della Valle - http://streamreasoning.org
  23. 23. Explaining the results of C-SPARQLS1 S2 S3 S4W0W1W2W3W4W5W6STt3 12 151ω=β=10The reporting strategy ofC-SPARQL is windowclose and non-emptyresultBeRSys 2013 - May 26, 2013Emanuele Della Valle - http://streamreasoning.orgS1 = <:m1 :detectedAt:r1>:[1]S2 = <:m2 :detectedAt:r1>:[3]S3 = <:m1 :detectedAt:r2>:[12]S4 = <:m2 .detectedAt:r2>:[15]
  24. 24. Explaining the results ofSPARQLSTREAMS1 S2 S3 S4W0W1W2W3W4W5W6STt3 12 151ω=β=10The reporting strategy ofSPARQLSTREAM is windowcloseBeRSys 2013 - May 26, 2013Emanuele Della Valle - http://streamreasoning.orgS1 = <:m1 :detectedAt:r1>:[1]S2 = <:m2 :detectedAt:r1>:[3]S3 = <:m1 :detectedAt:r2>:[12]S4 = <:m2 .detectedAt:r2>:[15]
  25. 25. Explaining the results ofCQELSS1 S2 S3 S4W0W1W2W3W4W5W6STt3 12 151ω=β=10The reporting strategy ofCQELS is contentchange, and non-emptyresultBeRSys 2013 - May 26, 2013Emanuele Della Valle - http://streamreasoning.orgS1 = <:m1 :detectedAt:r1>:[1]S2 = <:m2 :detectedAt:r1>:[3]S3 = <:m1 :detectedAt:r2>:[12]S4 = <:m2 .detectedAt:r2>:[15]
  26. 26. And so what?BeRSys 2013 - May 26, 2013Emanuele Della Valle - http://streamreasoning.org
  27. 27. Lets test correctness!DataQuerySELECT ?room WHERE {STREAM <http://ex.org/s1> [RANGE 3s SLIDE 3s] {?p1 :detectedAt ?room .?p2 :detectedAt ?room }FILTER (?p1 != ?p2) }TimelineS1 = <:m1 :detectedAt :r1>:[0]S2 = <:m2 :detectedAt :r2>:[5]S3 = <:m3 :detectedAt :r1>:[10]S4 = <:m4 :detectedAt :r2>:[15]S1 S2 S3 S4t0 5 10 15S1ω=β=3
  28. 28. Results of the testWhy?BeRSys 2013 - May 26, 2013Emanuele Della Valle - http://streamreasoning.org
  29. 29. Trying to make sense of it …DataQuerySELECT ?room WHERE {STREAM <http://ex.org/s1> [RANGE 3s SLIDE 3s] {?p1 :detectedAt ?room .?p2 :detectedAt ?room }FILTER (?p1 != ?p2) }TimelineS1 = <:m1 :detectedAt :r1>:[0]S2 = <:m2 :detectedAt :r2>:[5]S3 = <:m3 :detectedAt :r1>:[10]S4 = <:m4 :detectedAt :r2>:[15]S1 S2 S3 S4t0 5 10 15S1ω=β=3Lets remove thisfilter
  30. 30. Results of the new testIs this caused byincorrect removal ofthe triples from thewindow?BeRSys 2013 - May 26, 2013Emanuele Della Valle - http://streamreasoning.org
  31. 31. Does throughput matterswithout correctness?BeRSys 2013 - May 26, 2013Emanuele Della Valle - http://streamreasoning.org
  32. 32. ConclusionsThe different operational semantics of existingRDF stream engines affect the outputs and theperformance of those systemsThroughput measurements must beperformedwhile testing correctnessModeling RDF stream Engines using SECRETallows for checking correctnessSRbench and LSBench should be extended withan "oracle" that checks correctnessBeRSys 2013 - May 26, 2013Emanuele Della Valle - http://streamreasoning.org
  33. 33. Thank you for youattention: questions?Daniele Dell’Aglio,Marco Balduini, andEmanuele Della Valle

×