Phd
Upcoming SlideShare
Loading in...5
×
 

Phd

on

  • 542 views

 

Statistics

Views

Total Views
542
Views on SlideShare
542
Embed Views
0

Actions

Likes
1
Downloads
18
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Phd Phd Presentation Transcript

  • Ontology-based Access toSensor Data StreamsJean-Paul CalbimonteSupervisor: Oscar CorchoOntology Engineering GroupFacultad de Informática, Universidad Politécnica de Madridjp.calbimonte@upm.esPhD Thesis Defense18.4.2013
  • 2OutlineMotivationBackgroundConclusionsSemantic stream query processingSensor metadata characterizationOntology-based Access to Sensor Data StreamsHypotheses & contributionsChallenges
  • Motivation3from Sensor Networksto the Sensor Weband the Semantic Sensor Web
  • Sensors4http://www.flickr.com/photos/wouterh/2409251427/data capturedifferent Sensor providerstransmission. . .. . .data streams
  • Sensor Networks and the Web5Sensor Networksusersapplicationsdata streamsVolumeVelocityVariety WEBUniversal Web-based access to Sensor data
  • Querying the semantic sensor Web6e.g. publish sensor data as RDF/Linked Data?URIs as names of thingsHTTP URIsuseful information when URIis dereferencedLink to other URIsusersapplicationsWEBUse ontology models to continuously query real-time data streams originated from sensors?1static vs. streamsone-off vs. continuous
  • Research questions & hypotheses7Ontology models to query real-time sensor data streams?Access heterogeneous SPEs using ontologies as anoverarching data model?SPARQL streaming extensions for querying data from SPEs(stream processing engines)?1H1: Sensor streaming data  instances of an ontology modelH2: SPARQL extensions  streaming operators & continuous processingH3: Ontology-based streaming queries  rewritten to relational-basedqueries using mappingsH4: Ontology-based streaming queries  abstract expressions concrete executable SPE queriesH5: Query rewriting  Pull & Push delivery  acceptable overhead
  • Sensor Data: ObservationsCitizen ScienceMultiple publishersHeterogeneityMetadata quality8
  • Sensor data: observations99
  • Characterizing semantic sensor metadata10usersapplicationsWEBCharacterizing sensor data, deriving semanticmetadata from the sensor observations2different publishersdifferent metadatapublish streamsSearch/query relevantdata sources?GSN
  • Research questions & hypotheses11Data representation suitable for extracting data featuresthat characterize a set of sensor streams?Classification and mining techniques to characterizesensor data streams?2H6: Sensor data series  find characteristic patternsmake it recognizable among other typesH7: Slope representations  semantic properties such as the type of data learned with classification techniques acceptable precision
  • Contributions12 SPARQL extensions & formalization rewriting to algebra expressions using declarative mappings results data translation query evaluation pluggable to ≠ SPEs query rewriting using R2RML mappings data representation as slope distributions characterize types of sensor data classifying sensor time series extract metadata features derive semantic properties & R2RMLSPARQLStreamSensor metadata characterizationQueryingMetadata21
  • Limitations13L1: Rewriting  medium sampling throughput, e.g. Env. monitoringL2: Query expressivity  is limited to underlying SPEs’.L3: Adapters  implemented for custom sources.L4: Querying  only simple entailmentL5: Arbitrarily noisy sensor series  no accurate characterization.L6: Classification  number of sensor time series in training setL7: Data characterization is not computed in real-time, but offline
  • 14OutlineMotivationBackgroundConclusionsSemantic stream query processingSensor metadata characterizationOntology-based Access to Sensor Data StreamsHypotheses & contributionsChallengesData Streams Continuous queries WindowSPEs Ontology-based data access
  • Sensor data streams & events15(temp,hum,pres) τi(36.2,89,4) τimilford1(35.6,87,4) τi-1(37.2,88,4) τi+1watford7. . .(37.6,88,7) τi (36.3,89,2) τi+1. . .. . .stream tuplesevent processing
  • Querying streams & events16w1 w2windowsSELECT attribute FROM stream [NOW -10 MIN]streaming tuplesQueryprocessorquery resultsdatabaseContinuousqueryprocessorquerypushresultspullrequestSPEcontinuous processingone-off queries
  • Stream Processing Engines (SPE)17Data Stream Management Systems (DSMS)Complex Event Processors (CEP)Sensor Data MiddlewareCQL/StreamBorealisTelegraphCQStreamMillCayugaGEM CEDRNiagaraCQRapideCosmHourglassSStreamWare GSNIBM InfoSphereSybase CEPMicrosoft StreamInsightOracle CEPEsperStreamBaseDiverse query languagesDifferent query capabilitiesDifferent query models
  • Extracting data from relational databases18WEBOntology-baseddata accessone-off SPARQLqueriesdata as RDFrelational databaseRDB to RDFmappingsstatic dataD2RMorphODEMapster TriplifyUltraWrap MastroR2RMLW3C SSN Ontology
  • Summary19Existing SPEs available and producing data streamsOntology-based access only for stored dataSPARQL query language not suitable for streamsSPEs are highly heterogeneous in models and queries
  • 20OutlineMotivationBackgroundConclusionsSemantic stream query processingSensor metadata characterizationOntology-based Access to Sensor Data StreamsHypotheses & contributionsSPARQLStreamChallengesQuery rewritingRDF StreamMappings using R2RML Execution over SPEs
  • RDF Streams21s,p,o<aemet:observation1, qudt:hasNumericValue, “15.5”><aemet:observation1, ssn:observedBy, aemet:Sensor3>For streams?( s,p,o ,τ)(<aemet:observation1, qudt:hasNumericValue, “15.5”>,34532)timestamped triples• Gutierrez et al. (2007) Introducing time into RDF. IEEE TKDE• Rodríguez et al. (2009) Semantic management of streaming data. SSN
  • SPARQLStream extensions22SELECT (MAX(?temperature) AS ?maxtemp) ?sensorWHERE {?obs ssn:observedBy ?sensor.?obs ssn:observationResult ?res.?res aemet:hasAirTemperatureValue ?val.?val qu:numericValue ?temperature.}GROUP BY ?sensorSELECT (MAX(?temp) AS ?maxtemp) ?sensorFROM NAMED STREAM <http://aemet.linkeddata.es/observations.srdf> [NOW-1 HOURS]WHERE {?obs ssn:observedBy ?sensor.?obs ssn:observationResult ?res.?res aemet:hasAirTemperatureValue ?val.?val qu:numericValue ?temp.}GROUP BY ?sensorSPARQLStreamNamed streamsTime windowsOther approaches: Streaming SPARQL (2008), C-SPARQL (2009), CQELS(2011), EP-SPARQL (2011), INSTANS (2012)
  • Streaming SPARQL execution approaches23Extend RDF for streaming dataExtend SPARQL for streaming RDFUse a SPE internally for evaluationQuery rewriting to SPEsRDF Streaming engine from scratchLogic-programming based query evaluation~SimilaritiesDivergencestreamsDSMSsCEPsMiddlewareSPARQLStream
  • Mapping SPE schemas and ontologies24wan7timed: datetime PKsp_wind: floattimed sp_wind1 3.42 5.63 11.24 1.25 3.1.. …QueriesSELECT sp_windFROM wan7 [NOW -5 HOUR]WHERE sp_wind >10SPESPE data schemasssn:ObservationOntology modelsSPARQLStream QueriesStream-to-ontologymappingsSELECT ?wspeedFROM STREAM <SensorReadings.srdf> [NOW–5 HOUR]WHERE {?obs a ssn:ObservationValue;qudt:numericalValue ?wspeed;FILTER (?wspeed>10) }
  • http://swissex.ch/data#Wan7/WindSpeed/ObsValue{timed}sp_windhttp://swissex.ch/data#Wan7/WindSpeed/Observation{timed}http://swissex.ch/data#Wan7/ WindSpeed/ObsOutput{timed}sweetSpeed:WindSpeedCreating Mappings25wan7timed: datetime PKsp_wind: floatssn:ObservationValuequdt:numericValuexsd:decimalssn:SensorOutputssn:Observationssn:hasValuessn:observationResultssn:Propertyssn:observedProperty:Wan4WindSpeed a rr:TriplesMapClass; rr:tableName "wan7";rr:subjectMap [rr:template "http://swissex.ch/data#Wan7/WindSpeed/ObsValue/{timed}";rr:class ssn:ObservationValue; rr:graph ssg:swissexsnow.srdf ];rr:predicateObjectMap [ rr:predicateMap [ rr:predicate qudt:numericValue ];rr:objectMap [ rr:column "sp_wind” rr:datatypexsd:decimal]];.W3C R2RML Mapping Language
  • Query rewritingSELECT ?windspeedFROM STREAM <http://ssg4env.eu/SensorReadings.srdf>[NOW–5 HOUR TO NOW]WHERE {?obs a ssn:ObservationValue;qudt:numericalValue ?windspeed;FILTER (?windspeed>10) }SELECT sp_wind FROM wan7 [FROM NOW-5 HOURS TO NOW]WHERE sp_wind >10timed,sp_windπωσsp_wind>105 Hourwan7SELECT sp_wind FROM wan7.win:time(5 hour)WHERE sp_wind >10http://montblanc.slf.ch:22001/multidata?vs[0]=wan7&field[0]=wind_speed_scalar_av&c_min[0]=10&from=15/05/2012+05:00:00&to=15/05/2012+10:00:00http://api.cosm.com/v2/feeds/14321/datastreams/4?start=2012-05-15T05:00:00Z&end=2012-05-15T10:00:00ZQueryrewritingR2RMLSNEE (DSMS)Esper (DSMS)GSN (middlwr)Cosm(middlwr)26H4: Ontology-based streaming queries abstract expressions concrete executable SPE queriesH3: Ontology-based streaming queries rewritten to relational-basedqueries using mappingsSPARQLStream
  • Ontology-based query rewriting27QueryrewritingQueryProcessingClientSPARQLStream[tuples][triples/bindings]AlgebraexpressionR2RMLMappingsSPARQLStream query processingSELECT ?windspeedFROM STREAM <http://ssg4env.eu/SensorReadings.srdf>[NOW–5 HOUR]WHERE {?obs a ssn:ObservationValue;qudt:numericalValue ?windspeed;FILTER (?windspeed>10) }SELECT sp_windFROM wan7.win:time(5 hour)WHERE sp_wind >10π timed,sp_windωσsp_wind>105 Hourwan7DatatranslationSNEEEsperGSNCosmpull/pushhttps://github.com/jpcik/morph-streamsOtherH1: Sensor streaming data instances of an ontology modelH2: SPARQL extensions  streamingoperators & continuous processing
  • Evaluation of query rewriting overhead28H5: Query rewriting Pull & Push delivery acceptable overheadNative execution w/o rewritingExecution with rewritingPull & Push deliveryEnd-to latencyAdapted Esper benchmark
  • 29OutlineMotivationBackgroundConclusionsSemantic stream query processingSensor metadata characterizationOntology-based Access to Sensor Data StreamsHypotheses & contributionsRepresentationChallengesClassification Metadata
  • Characterizing semantic sensor metadata30WEBGSNAir Pressure?Air Temperature?Already classified time seriesUnclassified input seriescompare
  • Deriving Semantic Metadata31RepresentationClassificationMetadata
  • 0 1 2 3 4 5 6 7 8 9 103.653.73.753.83.853.93.9544.054.10 1 2 3 4 5 6 7 8 9 103.73.753.83.853.93.9544.054.1Piecewise Linear Approximation32Reflect data trendsApply with different resolutionsApplicable for different ratesOnline computation cheapLinear segmentsTime seriestimeReduce numerosity
  • Linear Approximations33adac0π/2-π/4π/4abcdKey: segment slopes (angles)Divide the angle space in sectorsdistribution of angles in training setcompute linear approximationcompute slope distributionK-nearest neighbor classification213
  • Experiments SwissExConfusion matrix SwissExTraining-Test datasetsSwissExperiment AEMET34
  • Experiments AEMETConfusion matrix AEMETH6: Sensor data series find characteristic patterns make it recognizable among other types35Classification according to typeFPs on subclasses of the same property
  • Evaluation vs SAX36H7: Slope representations type of data: semantic property learned through classification
  • Semantic Sensor Metadataswissex:Sensor1rdf:type ssn:Sensor;ssn:onPlatform swissex:Station1;ssn:observes cf-property:wind_speed.swissex:Sensor2rdf:type ssn:Sensor;ssn:onPlatform swissex:Station1;ssn:observes cf-property:air_temperature.37station1W3C SSN OntologyDerive semantic metadata propertiescf-property:wind_speed rdf:type dim:VelocityOrSpeed;rdfs:label "wind speed";ssn:isPropertyOf cf-feature:wind;qu:propertyType qu:scalar;qu:generalQuantityKind qu:speed.Raw sensor data Semantic metadata
  • 38OutlineMotivationBackgroundConclusionsSemantic stream query processingSensor metadata characterizationOntology-based Access to Sensor Data StreamsHypotheses & contributionsChallenges
  • ConclusionsH1: Sensor streaming data  instances of an ontology modelH2: SPARQL extensions  streaming operators & continuous processingH3: Ontology-based streaming queries  rewritten to relational-basedqueries using mappingsMapping sensor data to ontology instances, e.g. SSN OntologySPARQLStream  data model, extensions syntax, semanticsSPARQLStream  semantics of query rewriting to relational steamingalgebra usage of declarative mappings (W3C R2RML)Calbimonte, Corcho & Gray. Enabling ontology-based access to streaming data sources. ISWC 2010Gray, García-Castro, Kyzirakos, Karpathiotakis, Calbimonte, Page et al. A semantically enabled servicearchitecture for mashups over streaming and stored data. ESWC 2011Gray, Sadler, Kit, Kyzirakos, Karpathiotakis, Calbimonte, Page, García-Castro, et al. A semantic sensorweb for environmental decision support applications. Sensors, MDPI, 2011Calbimonte, Corcho & Gray. Ontology-based Access to Streaming Data. In Posters ESWC 201039
  • Conclusions40H4: Ontology-based streaming queries  abstract expressions concrete executable SPE queriesInstantiate, execute  ≠ SPEs: SNEE (DSMS), Esper (CEP), GSN & Cosm (Middlwr) Available implementation application in different domainsH5: Query rewriting  Pull & Push delivery  evaluation overheadSPARQLStream  evaluation overhead wrt. native executionPush & pull delivery evaluationCalbimonte, Jeung, Corcho & Aberer. Enabling Query Technologies for the Semantic Sensor Web. IJSWIS 2012.Calbimonte & Corcho. Evaluating SPARQL Queries over RDF Streams. Linked Data Management: Principlesand Techniques, CRC Press, 2013 (under review)Zhang, Duc, Corcho & Calbimonte. SRBench: A Streaming RDF/SPARQL Benchmark. ISWC 2012.Ruckhaus, Calbimonte, García-Castro & Corcho. Short Paper: From Streaming Data to Linked Data–A CaseStudy with Bike Sharing Systems. ISWC SSN 2012
  • Conclusions41H6: Sensor data series  analyze in order to find characteristic patternsmake it recognizable among other typesH7: Slope representations  semantic properties such as the type of data learned with classification techniques acceptable precision41Raw observations analysis  slope distribution representation compared with SoA representations i.e. SAXEvaluation of classification task  real world datasets AEMET, SwissEx in presence of noisy data deriving semantic metadataCalbimonte, Yan, Jeung, Corcho & Aberer. Deriving Semantic Sensor Metadata from Raw Measurements.ISWC SSN 2012Calbimonte, Jeung, Corcho, & Aberer. Semantic Sensor Data Search in a Large-Scale Federated SensorNetwork. ISWC SSN 2011
  • Future directions42WEBSPARQLStream queriesPublishing Linked Stream DataCurrently staticSPARQL streamingstandardsDereferencing streamingdataQuery FederationDistributed sensor dataStatic and streaming sourcesStream Reasoningquery rewriting, expanding queriesExpresivenessIntegrate with the Web of DataInferencing
  • Future directionsWEBSensor pattern classificationCombine with queryprocessingLive data classificationStatistical & quality analysis Integrate statistic analyisisMappings to statistical modelsData quality filteringParallel Massive Stream Processing Online stream analysisScalable stream processingS4, Storm, StreamcloudHeterogeneity43
  • Ontology-based Access toSensor Data StreamsJean-Paul CalbimonteSupervisor: Oscar CorchoOntology Engineering GroupFacultad de Informática, Universidad Politécnica de Madrid18.4.2013jp.calbimonte@upm.esPhD Thesis Defense
  • 45
  • SSN Ontology with other ontologies46W3C SSN Ontologytool for modeling our sensor datacombine with domain ontologies
  • Algebra construction47timed,sp_windπωσ sp_wind>105 Hourwan7windsensor1 windsensor2
  • Static optimization48timed,sp_windπωσ sp_wind>105 Hourwan7timed,windvalueπωσ windvalue>105 Hourwindsensor1timed,windvalueπωσ windvalue>105 Hourwindsensor2
  • SPARQL Streaming extensions49
  • SPARQL Stream features50
  • SRBench51
  • RDF Streams and SPARQLStream52RDF StreamTime windowWindow-Stream
  • Mappings53Subject, predicate, objectGiven a triple pattern t p = (sp, pp,op), the semantics of its evaluation over alational streams referenced by a set of mappings M , is given by eval (t p,M), whn algebra expression defined as:eval (t p,M) = ρf s→sp,f p→pp,f o→opπf s,f p,f o(s)where ρ is the relational rename operation and π is the relational projectionon. s is the stream referenced by the mapping µ = f i ndM appi ng(t p,M) and f s,e the functions of µ that generate the projection expressions for producing respece subject, predicate and object, for every tuple of s.For the previous example, the evaluation of t p1 is given by:eval (t p1,M) = ρf s→sp,f p→pp,f o→opπf sµ1(s1.ts),fpµ1(),f oµ1()(s1)The resulting algebra expression projects the s1.ts attribute, applying the f son to create the subject. The functions fpµ1and f oµ1in this case are constants,edicate and object are the same for all tuples of s1. For the evaluation of more coEvaluate query
  • Rewrite to algebra54Then, the evaluation of gp can be represented as the following algebra expression:eval (t p,M) = ωts,te,δ πf sµ1(s1) ✶ πf sµ2,f oµ2(s1) ✶ πf sµ4,f oµ4(s1) ✶πf sµ5,f oµ5(s1)This expression can be represented as a tree (Figure 4.1), where the leaf nodes are thestreams and the other nodes are the relational streaming operators.Figure 4.1: Tree representation of the evaluation of a SPARQL Stream query rewritten as an alge-bra expression.eval (t p, M ) = ωts,te,δ πf sµ1(s1) ✶ πf sµ2,f oµ2(s1) ✶ πf sµ4,f oµ4(s1) ✶πf sµ5,f oµ5(s1)This expression can be represented as a tree (Figure 4.1), where the leaf nodes are thstreams and the other nodes are the relational streaming operators.Figure 4.1: Tree representation of the evaluation of a SPARQL Stream query rewritten as an algbra expression.
  • Rewriting and Execution Process55
  • Execution process56
  • SRBench Datasetsreal-world U.S. weather data1first & largest sensor dataset in LOD57LinkedSensorDataLinkedSensorMetadata LinkedObservationData~20k US weather stations, ~100k sensorslinks to locations in GeoNames nearbyhurricane & blizzard observations in US~1.73 billion RDF triples~159 million observations1 http://mesowest.utah.eduName Storm Type Date #Triples #Observations Data sizeBill Hurricane Aug. 17 – 22, 2009 231,021,108 21,272,790 ~15 GBIke Hurricane Sep. 01 – 13, 2008 374,094,660 34,430,964 ~34 GBGustav Hurricane Aug. 25 – 31, 2008 258,378,511 23,792,818 ~17 GBBertha Hurricane Jul. 06 – 17, 2008 278,235,734 25,762,568 ~13 GBWilma Hurricane Oct. 17 – 23, 2005 171,854,686 15,797,852 ~10 GBKatrina Hurricane Aug. 23 – 30, 2005 203,386,049 18,832,041 ~12 GBCharley Hurricane Aug. 09 – 15, 2004 101,956,760 9,333,676 ~7 GBBlizzard Apr. 01 – 06, 2003 111,357,227 10,237,791 ~2 GB
  • SRBench Queries58graph pattern matchingsolution modifierquery formSPARQL 1.1reasoningstreamingdata accessand, filter, union, optionalprojection, distinctselect, construct, askaggregate, subquerysubclass, subproperty, sameAstime window, istreamobservations, sensor metadatageonames, dbpediaselect expr, property pathdstream, rstream17queries
  • Query Features59Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q171.Graph patternmatchingA A,F,O A A,F A A,F,U A A A A A,F A,F,U A,F A,F,U A,F A,F A,F2. Solution modifier P,D P,D P P P P P,D P P P,D P,D P P P,D P P P3. Query form S S A S C S S S S S S S S S S S S4. SPARQL 1.1 F,P A A,E,M,FA,S N A,E,M A,E,M A,S,M,FA,S,E,M,F,PA,E,M,F,PF,P A,E,M,PP P5. Reasoning C R C A C6. Streaming T T T T T T T,D T T T T T T T T7. Dataset O O O O O O O O,S O,S O,S O,S O,S,G O,S,G O,S,G O,S,D O,S,G,DS1. And, Filter, Union, Optional2. Projection, Distinct3. Select, Construct, Ask4. Aggregate, Subquery, Negation, Expr in SELECT, assignMent,Functions&operators, PropertyPath5. subClassOf, subpRopertyOf, owl:sameAs6. Time-based window, Istream, Dstream,Rstream7. LinkedObservationData, LinkedSensorMetadata, GeoNames, Dbpedia