SlideShare a Scribd company logo

SRBench Streaming RDF SPARQL Benchmark

SRBench benchmark for SPARQL RDF Streams

1 of 31
International Semantic Web Conference ISWC 2012




      SRBench: A Streaming
     RDF/SPARQL Benchmark

Ying Zhang1, Pham Minh Duc1, Oscar Corcho2, Jean-Paul Calbimonte2
                 1Centrum    Wiskunde & Informatica, Amsterdam
         2Ontology   Engineering Group, Universidad Politécnica de Madrid.



                               jp.calbimonte@upm.es




                                                                             Date: 14/11/2012
Streaming Data


                            Data streams

                            Timestamped tuples


                            Time windows



                            Continuous evaluation

e.g. Data Stream Management Systems (DSMS)

                     2
Sensor Web




“too much (streaming) data but not enough (tools to
gain and derive) knowledge”*




                            3
Semantic Web Tech for Streaming Data

Why?
 Annotate sensor data with semantic metadata

 Apply Linked Data principles to publish streaming data

 Interlink streaming data with existing datasets

 Integrate data stream processing + reasoning

 Raise the query abstraction level with ontologies




                          4
Semantic Sensor Web
   “too much (streaming) data but not enough (tools to
   gain and derive) knowledge”*
                                                                                                  LinkedSensorData
Sensor data publishing

Linked Data                                  LSM

                                                                 Sense2Web


Semantic sensor metadata                             Sensor APIs


                                                                                       ETALIS
Semantic Sensor Network ontology                                                                    Videk
                                                                             SwissEx


                                                                                            BOTTARI,
                                                                                            UrbanMatch
                                                 AEMET
                                                 transporte.
                                                 linkeddata.es



                                                                                                         SSN +CEP




  * Sheth et al. 2008, Semantic Sensor Web
                                             5
Semantic Sensor Web
         Querying semantic streaming data


RDF Streams
                         CQELS


SPARQL extensions
                                                            Streaming
                                                            SPARQL


                                                     EP-SPARQL


SPARQL-based continuous query processors

                                                        C-SPARQL

                                 SPARQL-Stream




                             6

Recommended

EDF2012 Peter Boncz - LOD benchmarking SRbench
EDF2012   Peter Boncz - LOD benchmarking SRbenchEDF2012   Peter Boncz - LOD benchmarking SRbench
EDF2012 Peter Boncz - LOD benchmarking SRbenchEuropean Data Forum
 
Standards for Semantic Mashups
Standards for Semantic MashupsStandards for Semantic Mashups
Standards for Semantic MashupsLaurent Lefort
 
SSN2012 Deriving Semantic Sensor Metadata from Raw Measurements
SSN2012 Deriving Semantic Sensor Metadata from Raw MeasurementsSSN2012 Deriving Semantic Sensor Metadata from Raw Measurements
SSN2012 Deriving Semantic Sensor Metadata from Raw MeasurementsJean-Paul Calbimonte
 
Enterprise linked data clouds
Enterprise linked data cloudsEnterprise linked data clouds
Enterprise linked data cloudsdamienjoyce
 
Web standards, why care?
Web standards, why care?Web standards, why care?
Web standards, why care?Thomas Roessler
 

More Related Content

Similar to SRBench Streaming RDF SPARQL Benchmark

The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...Gezim Sejdiu
 
Data Integration at the Ontology Engineering Group
Data Integration at the Ontology Engineering GroupData Integration at the Ontology Engineering Group
Data Integration at the Ontology Engineering GroupOscar Corcho
 
Sasa Nesic - PhD Dissertation Defense
Sasa Nesic - PhD Dissertation DefenseSasa Nesic - PhD Dissertation Defense
Sasa Nesic - PhD Dissertation DefenseSasa Nesic
 
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...Muhammad Saleem
 
Predictive maintenance withsensors_in_utilities_
Predictive maintenance withsensors_in_utilities_Predictive maintenance withsensors_in_utilities_
Predictive maintenance withsensors_in_utilities_Tina Zhang
 
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...DataWorks Summit
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataGiorgos Santipantakis
 
Wireless Network Topology Essay
Wireless Network Topology EssayWireless Network Topology Essay
Wireless Network Topology EssaySandra Campbell
 
Reactive Stream Processing for Data-centric Publish/Subscribe
Reactive Stream Processing for Data-centric Publish/SubscribeReactive Stream Processing for Data-centric Publish/Subscribe
Reactive Stream Processing for Data-centric Publish/SubscribeSumant Tambe
 
SSONDE: Semantic Similarity On liNked Data Entities
SSONDE: Semantic Similarity On liNked Data EntitiesSSONDE: Semantic Similarity On liNked Data Entities
SSONDE: Semantic Similarity On liNked Data EntitiesRiccardo Albertoni
 
RDF Stream Processing: Let's React
RDF Stream Processing: Let's ReactRDF Stream Processing: Let's React
RDF Stream Processing: Let's ReactJean-Paul Calbimonte
 
Machine Learning and Hadoop
Machine Learning and HadoopMachine Learning and Hadoop
Machine Learning and HadoopJosh Patterson
 
Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...Zekeriya Besiroglu
 
Data-intensive profile for the VAMDC
Data-intensive profile for the VAMDCData-intensive profile for the VAMDC
Data-intensive profile for the VAMDCAstroAtom
 
Dynamic and repeatable transformation of existing Thesauri and Authority list...
Dynamic and repeatable transformation of existing Thesauri and Authority list...Dynamic and repeatable transformation of existing Thesauri and Authority list...
Dynamic and repeatable transformation of existing Thesauri and Authority list...DESTIN-Informatique.com
 
ChemSpider compound database as one of the pillars of a semantic web for …
ChemSpider compound database as one of the pillars of a semantic web for …ChemSpider compound database as one of the pillars of a semantic web for …
ChemSpider compound database as one of the pillars of a semantic web for …Valery Tkachenko
 
Myriam phd
Myriam phdMyriam phd
Myriam phdiammyr
 
Big Data to SMART Data : Process Scenario
Big Data to SMART Data : Process ScenarioBig Data to SMART Data : Process Scenario
Big Data to SMART Data : Process ScenarioCHAKER ALLAOUI
 
Tracing Micro Services with OpenTracing
Tracing Micro Services with OpenTracingTracing Micro Services with OpenTracing
Tracing Micro Services with OpenTracingHemant Kumar
 
Building Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSetsBuilding Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSetsPat Patterson
 

Similar to SRBench Streaming RDF SPARQL Benchmark (20)

The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
 
Data Integration at the Ontology Engineering Group
Data Integration at the Ontology Engineering GroupData Integration at the Ontology Engineering Group
Data Integration at the Ontology Engineering Group
 
Sasa Nesic - PhD Dissertation Defense
Sasa Nesic - PhD Dissertation DefenseSasa Nesic - PhD Dissertation Defense
Sasa Nesic - PhD Dissertation Defense
 
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
 
Predictive maintenance withsensors_in_utilities_
Predictive maintenance withsensors_in_utilities_Predictive maintenance withsensors_in_utilities_
Predictive maintenance withsensors_in_utilities_
 
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
 
Wireless Network Topology Essay
Wireless Network Topology EssayWireless Network Topology Essay
Wireless Network Topology Essay
 
Reactive Stream Processing for Data-centric Publish/Subscribe
Reactive Stream Processing for Data-centric Publish/SubscribeReactive Stream Processing for Data-centric Publish/Subscribe
Reactive Stream Processing for Data-centric Publish/Subscribe
 
SSONDE: Semantic Similarity On liNked Data Entities
SSONDE: Semantic Similarity On liNked Data EntitiesSSONDE: Semantic Similarity On liNked Data Entities
SSONDE: Semantic Similarity On liNked Data Entities
 
RDF Stream Processing: Let's React
RDF Stream Processing: Let's ReactRDF Stream Processing: Let's React
RDF Stream Processing: Let's React
 
Machine Learning and Hadoop
Machine Learning and HadoopMachine Learning and Hadoop
Machine Learning and Hadoop
 
Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...
 
Data-intensive profile for the VAMDC
Data-intensive profile for the VAMDCData-intensive profile for the VAMDC
Data-intensive profile for the VAMDC
 
Dynamic and repeatable transformation of existing Thesauri and Authority list...
Dynamic and repeatable transformation of existing Thesauri and Authority list...Dynamic and repeatable transformation of existing Thesauri and Authority list...
Dynamic and repeatable transformation of existing Thesauri and Authority list...
 
ChemSpider compound database as one of the pillars of a semantic web for …
ChemSpider compound database as one of the pillars of a semantic web for …ChemSpider compound database as one of the pillars of a semantic web for …
ChemSpider compound database as one of the pillars of a semantic web for …
 
Myriam phd
Myriam phdMyriam phd
Myriam phd
 
Big Data to SMART Data : Process Scenario
Big Data to SMART Data : Process ScenarioBig Data to SMART Data : Process Scenario
Big Data to SMART Data : Process Scenario
 
Tracing Micro Services with OpenTracing
Tracing Micro Services with OpenTracingTracing Micro Services with OpenTracing
Tracing Micro Services with OpenTracing
 
Building Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSetsBuilding Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSets
 

More from Jean-Paul Calbimonte

Towards Collaborative Creativity in Persuasive Multi-agent Systems
Towards Collaborative Creativity in Persuasive Multi-agent SystemsTowards Collaborative Creativity in Persuasive Multi-agent Systems
Towards Collaborative Creativity in Persuasive Multi-agent SystemsJean-Paul Calbimonte
 
A Platform for Difficulty Assessment and Recommendation of Hiking Trails
A Platform for Difficulty Assessment andRecommendation of Hiking TrailsA Platform for Difficulty Assessment andRecommendation of Hiking Trails
A Platform for Difficulty Assessment and Recommendation of Hiking TrailsJean-Paul Calbimonte
 
Decentralized Management of Patient Profiles and Trajectories through Semanti...
Decentralized Management of Patient Profiles and Trajectories through Semanti...Decentralized Management of Patient Profiles and Trajectories through Semanti...
Decentralized Management of Patient Profiles and Trajectories through Semanti...Jean-Paul Calbimonte
 
Personal Data Privacy Semantics in Multi-Agent Systems Interactions
Personal Data Privacy Semantics in Multi-Agent Systems InteractionsPersonal Data Privacy Semantics in Multi-Agent Systems Interactions
Personal Data Privacy Semantics in Multi-Agent Systems InteractionsJean-Paul Calbimonte
 
SanTour: Personalized Recommendation of Hiking Trails to Health Pro files
SanTour: Personalized Recommendation of Hiking Trails to Health ProfilesSanTour: Personalized Recommendation of Hiking Trails to Health Profiles
SanTour: Personalized Recommendation of Hiking Trails to Health Pro filesJean-Paul Calbimonte
 
Multi-agent interactions on the Web through Linked Data Notifications
Multi-agent interactions on the Web through Linked Data NotificationsMulti-agent interactions on the Web through Linked Data Notifications
Multi-agent interactions on the Web through Linked Data NotificationsJean-Paul Calbimonte
 
The MedRed Ontology for Representing Clinical Data Acquisition Metadata
The MedRed Ontology for Representing Clinical Data Acquisition MetadataThe MedRed Ontology for Representing Clinical Data Acquisition Metadata
The MedRed Ontology for Representing Clinical Data Acquisition MetadataJean-Paul Calbimonte
 
Linked Data Notifications for RDF Streams
Linked Data Notifications for RDF StreamsLinked Data Notifications for RDF Streams
Linked Data Notifications for RDF StreamsJean-Paul Calbimonte
 
Fundamentos de Scala (Scala Basics) (español) Catecbol
Fundamentos de Scala (Scala Basics) (español) CatecbolFundamentos de Scala (Scala Basics) (español) Catecbol
Fundamentos de Scala (Scala Basics) (español) CatecbolJean-Paul Calbimonte
 
Connecting Stream Reasoners on the Web
Connecting Stream Reasoners on the WebConnecting Stream Reasoners on the Web
Connecting Stream Reasoners on the WebJean-Paul Calbimonte
 
RDF Stream Processing Tutorial: RSP implementations
RDF Stream Processing Tutorial: RSP implementationsRDF Stream Processing Tutorial: RSP implementations
RDF Stream Processing Tutorial: RSP implementationsJean-Paul Calbimonte
 
Query Rewriting in RDF Stream Processing
Query Rewriting in RDF Stream ProcessingQuery Rewriting in RDF Stream Processing
Query Rewriting in RDF Stream ProcessingJean-Paul Calbimonte
 
Toward Semantic Sensor Data Archives on the Web
Toward Semantic Sensor Data Archives on the WebToward Semantic Sensor Data Archives on the Web
Toward Semantic Sensor Data Archives on the WebJean-Paul Calbimonte
 
Detection of hypoglycemic events through wearable sensors
Detection of hypoglycemic events through wearable sensorsDetection of hypoglycemic events through wearable sensors
Detection of hypoglycemic events through wearable sensorsJean-Paul Calbimonte
 
RDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of SemanticsRDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of SemanticsJean-Paul Calbimonte
 
The Schema Editor of OpenIoT for Semantic Sensor Networks
The Schema Editor of OpenIoT for Semantic Sensor NetworksThe Schema Editor of OpenIoT for Semantic Sensor Networks
The Schema Editor of OpenIoT for Semantic Sensor NetworksJean-Paul Calbimonte
 
Scala Programming for Semantic Web Developers ESWC Semdev2015
Scala Programming for Semantic Web Developers ESWC Semdev2015Scala Programming for Semantic Web Developers ESWC Semdev2015
Scala Programming for Semantic Web Developers ESWC Semdev2015Jean-Paul Calbimonte
 

More from Jean-Paul Calbimonte (20)

Towards Collaborative Creativity in Persuasive Multi-agent Systems
Towards Collaborative Creativity in Persuasive Multi-agent SystemsTowards Collaborative Creativity in Persuasive Multi-agent Systems
Towards Collaborative Creativity in Persuasive Multi-agent Systems
 
A Platform for Difficulty Assessment and Recommendation of Hiking Trails
A Platform for Difficulty Assessment andRecommendation of Hiking TrailsA Platform for Difficulty Assessment andRecommendation of Hiking Trails
A Platform for Difficulty Assessment and Recommendation of Hiking Trails
 
Stream reasoning agents
Stream reasoning agentsStream reasoning agents
Stream reasoning agents
 
Decentralized Management of Patient Profiles and Trajectories through Semanti...
Decentralized Management of Patient Profiles and Trajectories through Semanti...Decentralized Management of Patient Profiles and Trajectories through Semanti...
Decentralized Management of Patient Profiles and Trajectories through Semanti...
 
Personal Data Privacy Semantics in Multi-Agent Systems Interactions
Personal Data Privacy Semantics in Multi-Agent Systems InteractionsPersonal Data Privacy Semantics in Multi-Agent Systems Interactions
Personal Data Privacy Semantics in Multi-Agent Systems Interactions
 
RDF data validation 2017 SHACL
RDF data validation 2017 SHACLRDF data validation 2017 SHACL
RDF data validation 2017 SHACL
 
SanTour: Personalized Recommendation of Hiking Trails to Health Pro files
SanTour: Personalized Recommendation of Hiking Trails to Health ProfilesSanTour: Personalized Recommendation of Hiking Trails to Health Profiles
SanTour: Personalized Recommendation of Hiking Trails to Health Pro files
 
Multi-agent interactions on the Web through Linked Data Notifications
Multi-agent interactions on the Web through Linked Data NotificationsMulti-agent interactions on the Web through Linked Data Notifications
Multi-agent interactions on the Web through Linked Data Notifications
 
The MedRed Ontology for Representing Clinical Data Acquisition Metadata
The MedRed Ontology for Representing Clinical Data Acquisition MetadataThe MedRed Ontology for Representing Clinical Data Acquisition Metadata
The MedRed Ontology for Representing Clinical Data Acquisition Metadata
 
Linked Data Notifications for RDF Streams
Linked Data Notifications for RDF StreamsLinked Data Notifications for RDF Streams
Linked Data Notifications for RDF Streams
 
Fundamentos de Scala (Scala Basics) (español) Catecbol
Fundamentos de Scala (Scala Basics) (español) CatecbolFundamentos de Scala (Scala Basics) (español) Catecbol
Fundamentos de Scala (Scala Basics) (español) Catecbol
 
Connecting Stream Reasoners on the Web
Connecting Stream Reasoners on the WebConnecting Stream Reasoners on the Web
Connecting Stream Reasoners on the Web
 
RDF Stream Processing Tutorial: RSP implementations
RDF Stream Processing Tutorial: RSP implementationsRDF Stream Processing Tutorial: RSP implementations
RDF Stream Processing Tutorial: RSP implementations
 
Query Rewriting in RDF Stream Processing
Query Rewriting in RDF Stream ProcessingQuery Rewriting in RDF Stream Processing
Query Rewriting in RDF Stream Processing
 
Toward Semantic Sensor Data Archives on the Web
Toward Semantic Sensor Data Archives on the WebToward Semantic Sensor Data Archives on the Web
Toward Semantic Sensor Data Archives on the Web
 
Detection of hypoglycemic events through wearable sensors
Detection of hypoglycemic events through wearable sensorsDetection of hypoglycemic events through wearable sensors
Detection of hypoglycemic events through wearable sensors
 
RDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of SemanticsRDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of Semantics
 
The Schema Editor of OpenIoT for Semantic Sensor Networks
The Schema Editor of OpenIoT for Semantic Sensor NetworksThe Schema Editor of OpenIoT for Semantic Sensor Networks
The Schema Editor of OpenIoT for Semantic Sensor Networks
 
Scala Programming for Semantic Web Developers ESWC Semdev2015
Scala Programming for Semantic Web Developers ESWC Semdev2015Scala Programming for Semantic Web Developers ESWC Semdev2015
Scala Programming for Semantic Web Developers ESWC Semdev2015
 
Streams of RDF Events Derive2015
Streams of RDF Events Derive2015Streams of RDF Events Derive2015
Streams of RDF Events Derive2015
 

SRBench Streaming RDF SPARQL Benchmark

  • 1. International Semantic Web Conference ISWC 2012 SRBench: A Streaming RDF/SPARQL Benchmark Ying Zhang1, Pham Minh Duc1, Oscar Corcho2, Jean-Paul Calbimonte2 1Centrum Wiskunde & Informatica, Amsterdam 2Ontology Engineering Group, Universidad Politécnica de Madrid. jp.calbimonte@upm.es Date: 14/11/2012
  • 2. Streaming Data Data streams Timestamped tuples Time windows Continuous evaluation e.g. Data Stream Management Systems (DSMS) 2
  • 3. Sensor Web “too much (streaming) data but not enough (tools to gain and derive) knowledge”* 3
  • 4. Semantic Web Tech for Streaming Data Why? Annotate sensor data with semantic metadata Apply Linked Data principles to publish streaming data Interlink streaming data with existing datasets Integrate data stream processing + reasoning Raise the query abstraction level with ontologies 4
  • 5. Semantic Sensor Web “too much (streaming) data but not enough (tools to gain and derive) knowledge”* LinkedSensorData Sensor data publishing Linked Data LSM Sense2Web Semantic sensor metadata Sensor APIs ETALIS Semantic Sensor Network ontology Videk SwissEx BOTTARI, UrbanMatch AEMET transporte. linkeddata.es SSN +CEP * Sheth et al. 2008, Semantic Sensor Web 5
  • 6. Semantic Sensor Web Querying semantic streaming data RDF Streams CQELS SPARQL extensions Streaming SPARQL EP-SPARQL SPARQL-based continuous query processors C-SPARQL SPARQL-Stream 6
  • 7. Approaches Extend RDF for streaming data ~Similarities Extend SPARQL for streaming RDF Apply reasoning on streaming RDF Query rewriting to DSMS or CEP Divergence Logic-programming based query evaluation RDF Streaming engine from scratch 7
  • 8. RDF Streaming Processing Challenges How to specify queries? Standard query language extensions not now How to compare systems? Streaming RDF/SPARQL benckmarks 8
  • 9. SRBench First benchmark for streaming RDF engines Streaming RDF/SPARQL benchmark Assess engines abilities of dealing with streaming data Based on real-world datasets Functional evaluation missing? crucial? distinctive? 9
  • 10. Existing Benchmarks? Linear Road Benchmark relational-based model ≠ RDF graph model no interlinking data with other datasets no reasoning RDF /SPARQL benchmarks • LUBM, BSBM, SP2Bench, ... meant for stored data one off-queries single static pre-generated dataset do not exploit LOD datasets no SPARQL 1.1 features*, no reasoning * Now the BSBM BI use case includes aggregates 10
  • 11. SRBench Challenges relevant realistic Proper benchmark dataset semantically valid interlinkable time-bounded summarization A concise set of features continuous data abstraction contextual data descriptive definition No standard query language C-SPARQL CQELS SPARQL-Stream 11
  • 12. SRBench Datasets LinkedSensorData real-world U.S. weather data1 first & largest sensor dataset in LOD LinkedSensorMetadata LinkedObservationData ~20k US weather stations, ~100k sensors hurricane & blizzard observations in US links to locations in GeoNames nearby ~1.73 billion RDF triples ~159 million observations Name Storm Type Date #Triples #Observations Data size Bill Hurricane Aug. 17 – 22, 2009 231,021,108 21,272,790 ~15 GB Ike Hurricane Sep. 01 – 13, 2008 374,094,660 34,430,964 ~34 GB Gustav Hurricane Aug. 25 – 31, 2008 258,378,511 23,792,818 ~17 GB Bertha Hurricane Jul. 06 – 17, 2008 278,235,734 25,762,568 ~13 GB Wilma Hurricane Oct. 17 – 23, 2005 171,854,686 15,797,852 ~10 GB Katrina Hurricane Aug. 23 – 30, 2005 203,386,049 18,832,041 ~12 GB Charley Hurricane Aug. 09 – 15, 2004 101,956,760 9,333,676 ~7 GB Blizzard Apr. 01 – 06, 2003 111,357,227 10,237,791 ~2 GB 1 http://mesowest.utah.edu 12
  • 13. SRBench Datasets GeoNames DBpedia geographical database, >8M places largest & most popular dataset in LOD ~8M geographic features structured information from Wikipedia ~146M RDF triples links to GeoNames through owl:sameAs ~10GB on disk we only use the English language collection ~181M RDF triples ~27GB on disk 13
  • 14. SRBench Dataset model LinkedSensorData LinkedObservationData LinkedSensorMetadata om-owl:procedure Observation System om-owl:samplingTime om-owl:result om-owl:hasLocatedNearRel om-owl:processLocation Instant ResultData Point LocatedNearRel MeasureData TruthData DBpedia GeoNames owl:sameAs om-owl:hasLocation Airport Feature 14
  • 15. SRBench Queries graph pattern matching and, filter, union, optional solution modifier projection, distinct 17 queries query form select, construct, ask aggregate, subquery SPARQL 1.1 select expr, property path reasoning subclass, subproperty, sameAs time window, istream streaming dstream, rstream observations, sensor metadata data access geonames, dbpedia 15
  • 16. Query Features Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 1.Graph pattern A A,F,O A A,F A A,F,U A A A A A,F A,F,U A,F A,F,U A,F A,F A,F matching 2. Solution modifier P,D P,D P P P P P,D P P P,D P,D P P P,D P P P 3. Query form S S A S C S S S S S S S S S S S S 4. SPARQL 1.1 F,P A A,E,M A,S N A,E,M A,E,M A,S,M A,S,E, A,E,M F,P A,E,M P P ,F ,F M,F,P ,F,P ,P 5. Reasoning C R C A C 6. Streaming T T T T T T T,D T T T T T T T T 7. Dataset O O O O O O O O,S O,S O,S O,S O,S,G O,S,G O,S,G O,S,D O,S,G S ,D 1. And, Filter, Union, Optional 2. Projection, Distinct 3. Select, Construct, Ask 4. Aggregate, Subquery, Negation, Expr in SELECT, assignMent, Functions&operators, PropertyPath 5. subClassOf, subpRopertyOf, owl:sameAs 6. Time-based window, Istream, Dstream,Rstream 7. LinkedObservationData, LinkedSensorMetadata, GeoNames, Dbpedia 16
  • 17. Sample Queries Q1. Get the rainfall observed once in an hour om-owl:result weather: result RainfallObservation time-window om-owl:floatValue om-owl:uom basic graph patterns value uom Q2. Get all precipitation observed once in an hour om-owl:result weather: result PrecipitationObservation om-owl:floatValue om-owl:uom weather: value uom RainfallObservation weather: GraupelObservation weather: DrizzleObservation 17
  • 18. Sample Queries Q4. Get average wind speed at stations where the air temperature is >32 deg. in the last hour every 10 min aggregates filter time-window om-owl:procedure weather: weather: WindSpeedObservation TemperatureObservation 18
  • 19. Sample Queries Q9. Get the daily average wind force and direction observed by the sensor at a given location. om-owl:result om-owl:floatValue <1 0 <4 1 AVG weather: result value WindSpeedObservation <8 2 <13  3 … om-owl:result om-owl:floatValue Beaufort scale AVG weather: result value WindDirectionObservation Some semantics to bare wind speed numbers Post process qualified triple patterns 19
  • 20. Sample Queries Q12. Get the hourly average air temperature and humidity of large cities om-owl:hasLocatedNearRel sensor om-owl:hasLocation feature feature population >15000 gn:population 20
  • 21. Query Implementations http://www.w3.org/wiki/SRBench C-SPARQL SPARQLStream CQELS Not exhaustive! 21
  • 22. Query implementations Q6. Get the stations that have observed extremely low visibility in the last hour. SELECT ?sensor FROM NAMED STREAM <http://www.cwi.nl/SRBench/observations> [NOW - 1 HOURS] WHERE { { ?observation om-owl:procedure ?sensor ; a weather:VisibilityObservation ; UNION om-owl:result [om-owl:floatValue ?value ] . FILTER ( ?value < "10"^^xsd:float) } SPARQLStream { ?observation om-owl:procedure ?sensor ; a weather:RainfallObservation ; om-owl:result [om-owl:floatValue ?value ] . FILTER ( ?value > "30"^^xsd:float) } UNION { ?observation om-owl:procedure ?sensor ; a weather:SnowfallObservation . } } SELECT ?sensor FROM NAMED STREAM <http://www.cwi.nl/SRBench/observations>[RANGE 1h TUMBLING] WHERE { { ?observation om-owl:procedure ?sensor ; a weather:VisibilityObservation ; om-owl:result [om-owl:floatValue ?value ] . FILTER ( ?value < "10"^^xsd:float) } UNION { ?observation om-owl:procedure ?sensor ; a weather:RainfallObservation ; om-owl:result [om-owl:floatValue ?value ] . FILTER ( ?value > "30"^^xsd:float) } C-SPARQL UNION { ?observation om-owl:procedure ?sensor ; a weather:SnowfallObservation . } } SELECT ?sensor WHERE { STREAM <http://www.cwi.nl/SRBench/observations> [RANGE 3600s] { { ?observation om-owl:procedure ?sensor ; a weather:VisibilityObservation ; om-owl:result [om-owl:floatValue ?value ] . FILTER ( ?value < "10"^^xsd:float)} UNION { ?observation om-owl:procedure ?sensor ; a weather:RainfallObservation ; CQELS om-owl:result [om-owl:floatValue ?value ] . FILTER ( ?value > "30"^^xsd:float) } UNION { ?observation om-owl:procedure ?sensor ; a weather:SnowfallObservation . } } } 22
  • 23. Query implementations Q3. Detect if a hurricane has been observed « A hurricane has a sustained wind (for more than 3 hours) of at least 33 metres per second or 74 miles per hour (119 km/h) » ASK FROM NAMED STREAM <http://www.cwi.nl/SRBench/observations> [NOW - 3 HOURS SLIDE 10 MINUTES] WHERE { ?observation om-owl:procedure ?sensor ; om-owl:observedProperty weather:WindSpeed ; om-owl:result [ om-owl:floatValue ?value ] . } GROUP BY ?sensor HAVING ( AVG(?value) >= "74"^^xsd:float ) SPARQLStream ASK FROM STREAM <http://www.cwi.nl/SRBench/observations> [RANGE 1h STEP 10m] WHERE { ?observation om-owl:procedure ?sensor ; om-owl:observedProperty weather:WindSpeed ; om-owl:result [ om-owl:floatValue ?value ] . } GROUP BY ?sensor HAVING ( AVG(?value) >= "74"^^xsd:float ) C-SPARQL ASK WHERE { STREAM <http://www.cwi.nl/SRBench/observations> [RANGE 10800s SLIDE 600s] { ?observation om-owl:procedure ?sensor ; om-owl:observedProperty weather:WindSpeed ; om-owl:result [ om-owl:floatValue ?value ] .} } GROUP BY ?sensor HAVING ( AVG(?value) >= "74"^^xsd:float ) CQELS 23
  • 24. Query implementations Q2. Get all precipitation observed once in an hour SELECT DISTINCT ?sensor ?value ?uom FROM NAMED STREAM <http://www.cwi.nl/SRBench/observations> [NOW - 1 HOURS] WHERE { ?observation om-owl:procedure ?sensor ; rdf:type/rdfs:subClassOf* weather:PrecipitationObservation ; om-owl:result ?result . ?result om-owl:floatValue ?value . OPTIONAL { ?result om-owl:uom ?uom . } } 24
  • 25. Functional Evaluation System Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 SPARQLStream PP A G G G G,IF SD SD PP,SD PP,SD PP,SD PP,SD PP,SD PP,SD CQELS PP A D/N IF PP PP PP PP PP PP C-SPARQL PP A D IF PP PP PP PP PP PP Ask Dstream Group by and aggregations IF expression Negation Property Path Static Dataset 25
  • 26. Evaluation:Discussion • the graph pattern matching features • solution modifiers • SELECT and CONSTRUCT query forms • property path expressions are not supported • lack of support for the ASK • DSTREAM, alternatively NOT EXISTS • Lack of reasoning • C-SPARQLsimple RDF entailment • SPARQLStream  ontology-based query rewriting • CQELS  Native implementation 26
  • 27. Ongoing work: Tests criteria • Correctness • query results validated • possible variations in ordering • possibly multiple valid results per query • mismatch, precision/recall • Throughput: • maximal number data items a strRS engine is able to process per time unit • Scalability: • increasing number of incoming streams • Increasing number of continuous queries to be processed • Response time: • minimal elapsed time between a data item entering the system and being returned as output of a query • mainly relevant for queries allowing immediate query results upon receiving of a data item 27
  • 28. Evaluation Issues • Correctness, Throughput, Scalability • Different outputs • Differences in query semantics? • Very different query evaluation approaches • Reasoning • Execution parameters 28
  • 29. Some related work • Framework with a toolset for evaluating Linked Stream Data engines Linked Stream Data Processing Engines: Facts and Figures. Le-Phuoc et al. ISWC 2012 • Subset of functionalities • Missing functions, reasoning, property paths, window-to-stream, • Synthetic data • but flexibility • First set of performace tests • Showcase benefits of using semantic technologies? 29
  • 30. Close to the end  SRBench: the first benchmark for streaming RDF engines  Version 1 ◦ SRBench specification ◦ Functional evaluation  Much room left for improvements ◦ Streaming RDF processing is an evolving topic ◦ Exploiting more reasoning possibilities on semantic data ◦ Performance evaluation in Version 2 30
  • 31. …Thanks Questions, please. jp.calbimonte@upm.es 31