Your SlideShare is downloading. ×
On the need to include functional testing in RDF stream engine benchmarks
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

On the need to include functional testing in RDF stream engine benchmarks

434
views

Published on

Published in: Technology

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
434
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. On the need to includefunctional testing in RDFstream enginebenchmarksDaniele Dell’Aglio,Marco Balduini, andEmanuele Della Valle1st International Workshop onBenchmarking RDF Systems (BeRSys2013)co-located with ESWC 2013May 26th, 2013Montpellier, France
  • 2. AgendaBackground onData Stream Management Systems (DSMS)RDF Stream EnginesBenchmarking RDF Stream EngineOperational semantics of RDF Stream EnginesTesting the correctness of continuous queriesresultsConclusionsBeRSys 2013 - May 26, 2013Emanuele Della Valle - http://streamreasoning.org
  • 3. BackgroundData Stream Management SystemsWhat are data streams?Formally:Data streams are unbounded sequences of time-varying data elementsLess formally:an (almost)“continuous”flow of informationwith the recent information being more relevant as itdescribes the current state of a dynamic systemtimeBeRSys 2013 - May 26, 2013Emanuele Della Valle - http://streamreasoning.org
  • 4. BackgroundData Stream Management SystemsThe nature of streams requires a paradigmaticchange*from persistent datato be stored and queried on demanda.k.a. one time semanticsto transient datato be consumed on the fly by continuous queriesa.k.a. continuous semantics* This paradigmatic change first arose in DB community BeRSys 2013 - May 26, 2013Emanuele Della Valle - http://streamreasoning.org
  • 5. BackgroundData Stream Management SystemsContinuous queries registered over streamsthat, in most of the cases, are observed troughwindowsStreams of answerproduced by therelation to streamoperatorsRegisteredContinuous QueryWindow:stream torelationoperatorsinput streamsExpress usingrelation to relationoperatorsBeRSys 2013 - May 26, 2013Emanuele Della Valle - http://streamreasoning.org
  • 6. BackgroundData Stream Management SystemsTypes of windows (a.k.a., stream to relationoperators)physical: a given number of data elementslogical: a variable number of data elements which occurduring a given time interval (e.g., 1 hour)Sliding: they are progressively advanced ofa given STEP (e.g., 5 minutes)Tumbling: they are advanced of exactly their timeintervalBeRSys 2013 - May 26, 2013Emanuele Della Valle - http://streamreasoning.org
  • 7. BackgroundRDF Stream EnginesRDF Stream Engines ports DSMS concepts intothe Semantic Web extendingRDF data model with the notion of RDF Stream…<si pi oi> : [τ1]<si+1 pi+1 oi+1> : [τ1+1]…SPARQL to express and process continuous queriesExisting languages/enginesCQELSSPARQLSTREAMC-SPARQLTimestamps arenon-decreasing to allowfor expressingcontemporaneityBeRSys 2013 - May 26, 2013Emanuele Della Valle - http://streamreasoning.org
  • 8. BackgroundE.g., where C-SPARQL extendsSPARQL
  • 9. BackgroundE.g., where C-SPARQL extendsSPARQL
  • 10. BackgroundAn example of C-SPARQL queryWho are the opinion makers? i.e., the users who arelikely to influence the behaviour of other users whofollow themREGISTER STREAM OpinionMakers COMPUTED EVERY 5m ASCONSTRUCT { ?opinionMaker sd:about ?resource }FROM STREAM<http://streamingsocialdata.org/interactions>[RANGE 30m STEP 5m]WHERE {?opinionMaker ?opinion ?resource .?follower sioc:follows ?opinionMaker.?follower ?opinion ?resource.FILTER ( cs:timestamp(?follower) >cs:timestamp(?opinionMaker)&& ?opinion != sd:accesses ) BeRSys 2013 - May 26, 2013
  • 11. BackgroundAn example of C-SPARQL queryWho are the opinion makers? i.e., the users who arelikely to influence the behaviour of other users whofollow themREGISTER STREAM OpinionMakers COMPUTED EVERY 5m ASCONSTRUCT { ?opinionMaker sd:about ?resource }FROM STREAM<http://streamingsocialdata.org/interactions>[RANGE 30m STEP 5m]WHERE {?opinionMaker ?opinion ?resource .?follower sioc:follows ?opinionMaker.?follower ?opinion ?resource.FILTER ( cs:timestamp(?follower) >cs:timestamp(?opinionMaker)&& ?opinion != sd:accesses )Query registration(for continuousexecution)FROM STREAMclauseWINDOWRDF Stream addedasnew output formatBuiltin toaccesstimestampsAggregatesas in SPARQL1.1BeRSys 2013 - May 26, 2013
  • 12. BackgroundBenchmarking RDF stream enginesSRBenchDataset: LinkedSensorData (real meteorologicalsensor data)Queries: 17 continuous queries, some requiring RDFSreasoningKPI: feature coverage and correctnessLSBenchDataset: synthetic social network inspired data setQueries: 12 continuous queries involving multiplestream and static knowledgeKPI: input throughput and correctnessNotverifiedVerifiedcomparing thenumber ofresults producedby different
  • 13. Is verifying correctness hard?Not for SPARQLhttp://www.w3.org/2009/sparql/docs/tests/Queries + expected resultsHowever, it is hard for continuous (SPARQL)queries1 query  multiple correct resultsInput data and query are not enough to determinethe correct resultBeRSys 2013 - May 26, 2013Emanuele Della Valle - http://streamreasoning.org
  • 14. A simple testTake the motivation scenario of CQELSthere are two connected rooms, r1 and r2;each room has a sensor able to detect the individualsinside, m1 and m2.The stream ST contains the following triples:<:m1 :detectedAt :r1>:[1]<:m2 :detectedAt :r1>:[3]<:m1 :detectedAt :r2>:[12]<:m2 :detectedAt :r2>:[15]S1S2S3S4BeRSys 2013 - May 26, 2013Emanuele Della Valle - http://streamreasoning.org
  • 15. The query of the simple testWe want to know when the two individuals m1 and m2 are in thesame room using time-based tumbling window of 10 seconds.REGISTER QUERY SimpleTest ASSELECT ?roomFROM STREAM <http://ex.org/ST> [RANGE 10s STEP 10s]WHERE {:m1 :detectedAt ?room .:m2 :detectedAt ?room}BeRSys 2013 - May 26, 2013Emanuele Della Valle - http://streamreasoning.org
  • 16. All the results you can obtainRunning the test in C-SPARQL, CQELS andSPARQLSTREAM the following results can beobtained.Are they all correct?How can this be?BeRSys 2013 - May 26, 2013Emanuele Della Valle - http://streamreasoning.org
  • 17. These engines havedifferent operationalsemantics!BeRSys 2013 - May 26, 2013Emanuele Della Valle - http://streamreasoning.org
  • 18. The devil is in the details!S1 S2 S3 S4W0S1 = <:m1 :detectedAt:r1>:[1]S2 = <:m2 :detectedAt:r1>:[3]S3 = <:m1 :detectedAt:r2>:[12]S4 = <:m2 .detectedAt:r2>:[15]STt3 12 151W1W2W3W4W5W6BeRSys 2013 - May 26, 2013Emanuele Della Valle - http://streamreasoning.org
  • 19. Can operational semantics of RDFStream Engines be modelled?A model has been proposed to explain thedifferences that appear between different DSMS:SECRETThe results of a DSMS not only depends on the inputand the query, but also on the systemBeRSys 2013 - May 26, 2013Emanuele Della Valle - http://streamreasoning.org
  • 20. ScopE in the SECRET modelIt is the time range of the active window[topen,tclose)it is determined using the size ω andslideβparameters of the window as written by thequery issuerBeRSys 2013 - May 26, 2013Emanuele Della Valle - http://streamreasoning.orgtapp1 2 3 4 5 6 7tW1W2W3ClosedOpenActiveω=3β= 2
  • 21. Content in the SECRET modelIt is the subset of the stream included of theactive windowIt is determined usingthe size ω and slide β parameters of the window andt0 the time instant on which the first window starts,W0W1W2W3t3 12 151ω=β=10Different values fort0 BeRSys 2013 - May 26, 2013Emanuele Della Valle - http://streamreasoning.org
  • 22. Report in the SECRET modelIt defines the conditions under which the windowcontents become visible for further queryevaluation and result reportingIt can take a logical combination of the following:content changewindow closenon-empty contentperiodicBeRSys 2013 - May 26, 2013Emanuele Della Valle - http://streamreasoning.org
  • 23. Explaining the results of C-SPARQLS1 S2 S3 S4W0W1W2W3W4W5W6STt3 12 151ω=β=10The reporting strategy ofC-SPARQL is windowclose and non-emptyresultBeRSys 2013 - May 26, 2013Emanuele Della Valle - http://streamreasoning.orgS1 = <:m1 :detectedAt:r1>:[1]S2 = <:m2 :detectedAt:r1>:[3]S3 = <:m1 :detectedAt:r2>:[12]S4 = <:m2 .detectedAt:r2>:[15]
  • 24. Explaining the results ofSPARQLSTREAMS1 S2 S3 S4W0W1W2W3W4W5W6STt3 12 151ω=β=10The reporting strategy ofSPARQLSTREAM is windowcloseBeRSys 2013 - May 26, 2013Emanuele Della Valle - http://streamreasoning.orgS1 = <:m1 :detectedAt:r1>:[1]S2 = <:m2 :detectedAt:r1>:[3]S3 = <:m1 :detectedAt:r2>:[12]S4 = <:m2 .detectedAt:r2>:[15]
  • 25. Explaining the results ofCQELSS1 S2 S3 S4W0W1W2W3W4W5W6STt3 12 151ω=β=10The reporting strategy ofCQELS is contentchange, and non-emptyresultBeRSys 2013 - May 26, 2013Emanuele Della Valle - http://streamreasoning.orgS1 = <:m1 :detectedAt:r1>:[1]S2 = <:m2 :detectedAt:r1>:[3]S3 = <:m1 :detectedAt:r2>:[12]S4 = <:m2 .detectedAt:r2>:[15]
  • 26. And so what?BeRSys 2013 - May 26, 2013Emanuele Della Valle - http://streamreasoning.org
  • 27. Lets test correctness!DataQuerySELECT ?room WHERE {STREAM <http://ex.org/s1> [RANGE 3s SLIDE 3s] {?p1 :detectedAt ?room .?p2 :detectedAt ?room }FILTER (?p1 != ?p2) }TimelineS1 = <:m1 :detectedAt :r1>:[0]S2 = <:m2 :detectedAt :r2>:[5]S3 = <:m3 :detectedAt :r1>:[10]S4 = <:m4 :detectedAt :r2>:[15]S1 S2 S3 S4t0 5 10 15S1ω=β=3
  • 28. Results of the testWhy?BeRSys 2013 - May 26, 2013Emanuele Della Valle - http://streamreasoning.org
  • 29. Trying to make sense of it …DataQuerySELECT ?room WHERE {STREAM <http://ex.org/s1> [RANGE 3s SLIDE 3s] {?p1 :detectedAt ?room .?p2 :detectedAt ?room }FILTER (?p1 != ?p2) }TimelineS1 = <:m1 :detectedAt :r1>:[0]S2 = <:m2 :detectedAt :r2>:[5]S3 = <:m3 :detectedAt :r1>:[10]S4 = <:m4 :detectedAt :r2>:[15]S1 S2 S3 S4t0 5 10 15S1ω=β=3Lets remove thisfilter
  • 30. Results of the new testIs this caused byincorrect removal ofthe triples from thewindow?BeRSys 2013 - May 26, 2013Emanuele Della Valle - http://streamreasoning.org
  • 31. Does throughput matterswithout correctness?BeRSys 2013 - May 26, 2013Emanuele Della Valle - http://streamreasoning.org
  • 32. ConclusionsThe different operational semantics of existingRDF stream engines affect the outputs and theperformance of those systemsThroughput measurements must beperformedwhile testing correctnessModeling RDF stream Engines using SECRETallows for checking correctnessSRbench and LSBench should be extended withan "oracle" that checks correctnessBeRSys 2013 - May 26, 2013Emanuele Della Valle - http://streamreasoning.org
  • 33. Thank you for youattention: questions?Daniele Dell’Aglio,Marco Balduini, andEmanuele Della Valle