How good is your SPARQL endpoint? A QoS-Aware SPARQL Endpoint Monitoring and Data Source Selection Mechanism for Federated SPARQL Queries

1. How good is your SPARQL endpoint? A QoS-aware SPARQL endpoint monitoring and data source selection mechanism for federated SPARQL queries Ali Intizar and Alessandra Mileo

5. Linked Open Data and SPARQL Endpoints • Linked Data • LOD cloud 2 28/10/2014

6. Linked Open Data and SPARQL Endpoints • Linked Data • LOD cloud 2 28/10/2014

7. Linked Open Data and SPARQL Endpoints • Linked Data • LOD cloud • SPARQL Endpoints 2 28/10/2014

8. Linked Open Data and SPARQL Endpoints • Linked Data • LOD cloud • SPARQL Endpoints • Both pubic and private 2 28/10/2014

9. Linked Open Data and SPARQL Endpoints • Linked Data • LOD cloud • SPARQL Endpoints • Both pubic and private • Allow easy access to linked data using SPARQL queries • Provide a querying interface 2 28/10/2014

10. Linked Open Data and SPARQL Endpoints • Linked Data • LOD cloud • SPARQL Endpoints • Both pubic and private • Allow easy access to linked data using SPARQL queries • Provide a querying interface • Open Data Management Tools 2 28/10/2014

11. Linked Open Data and SPARQL Endpoints • Linked Data • LOD cloud • SPARQL Endpoints • Both pubic and private • Allow easy access to linked data using SPARQL queries • Provide a querying interface • Open Data Management Tools • Datahub 2 28/10/2014

12. Linked Open Data and SPARQL Endpoints • Linked Data • LOD cloud • SPARQL Endpoints • Both pubic and private • Allow easy access to linked data using SPARQL queries • Provide a querying interface • Open Data Management Tools • Datahub • LOD Stats 28/10/2014 2

13. Linked Open Data and SPARQL Endpoints • Linked Data • LOD cloud • SPARQL Endpoints • Both pubic and private • Allow easy access to linked data using SPARQL queries • Provide a querying interface • Open Data Management Tools • Datahub • LOD Stats • SPARQL Endpoint Description • Vocabulary for Interlinking Datasets • Service Description 28/10/2014 2

14. Ranking of the SPARQL Endpoints • Multiple SPARQL endpoints can represent the same dataset 3 28/10/2014

15. Ranking of the SPARQL Endpoints • Multiple SPARQL endpoints can represent the same dataset • Which one is the best for me? 3 28/10/2014

16. Ranking of the SPARQL Endpoints • Multiple SPARQL endpoints can represent the same dataset • Ranking of the SPARQL endpoints 3 28/10/2014

17. Ranking of the SPARQL Endpoints • Multiple SPARQL endpoints can represent the same dataset • Ranking of the SPARQL endpoints • Based on QoI/QoS Parameters 4 28/10/2014

18. Ranking of the SPARQL Endpoints • Multiple SPARQL endpoints can represent the same dataset • Ranking of the SPARQL endpoints • Based on QoI/QoS Parameters 4 28/10/2014

19. QoS Parameters for SPARQL Endpoints For QoS based ranking of the SPARQL endpoints • Identification of the various QoS parameters associated with the SPARQL endpoints • Semantic respresentation of the identified QoS parameters • Extension of the existing SPARQL endpoints description vocabularies (VoID/SD) to associate QoS parameters • Evaluation techniques for the QoS metrics • Continuous monitoring of the SPARQL endpoints to generate QoS profiles 5 28/10/2014

21. QoS Parameters for SPARQL Endpoints 28/10/2014 • Performance • Response Time • Execution Time • Throughput • Error Rate 6

22. QoS Parameters for SPARQL Endpoints • Performance • Response Time • Execution Time • Throughput • Error Rate • Data Quality • Accuracy • Data Consistency • Completeness • Freshness 6 28/10/2014

23. QoS Parameters for SPARQL Endpoints • Interoperabiilty • SPARQL Version • Additional Features • Restricted Features 7 28/10/2014

24. QoS Parameters for SPARQL Endpoints • Interoperabiilty • SPARQL Version • Additional Features • Restricted Features • Availability • UpTime • DownTime • MeanUpTime • MTTR 7 28/10/2014

25. QoS Parameters for SPARQL Endpoints • Licensing • PDDL • ODC-By • ODC-ODbL • CC0 0.1 Universal 8 28/10/2014

26. QoS Parameters for SPARQL Endpoints • Licensing • PDDL • ODC-By • ODC-ODbL • CC0 0.1 Universal • ResultSet • Size Limit • Result Format 8 28/10/2014

27. QoS Parameters for SPARQL Endpoints • Licensing • PDDL • ODC-By • ODC-ODbL • CC0 0.1 Universal • ResultSet • Size Limit • Result Format • Dataset Description • VoID • SD 8 28/10/2014

29. QoS Parameters for SPARQL Endpoints • Semantic Description of SPARQL Endpoint (VoID/SD) • QoS Profile of SPARQL Endpoints SPARQL Endpoint has QoSProfileEndpoint QoSProfile QoSProfileDefault QoSProfileUser Property SubClass 10 28/10/2014

30. QoS Parameters for SPARQL Endpoints • Semantic Description of SPARQL Endpoint • QoS Profile of SPARQL Endpoints SPARQL Endpoint has QoSProfileEndpoint QoSProfile QoSProfileDefault QoSProfileUser Property SubClass 10 28/10/2014

31. QoS Parameters for SPARQL Endpoints • Semantic Description of SPARQL Endpoint • QoS Profile of SPARQL Endpoints SPARQL Endpoint has QoSProfileEndpoint QoSProfile QoSProfileDefault QoSProfileUser Property SubClass 10 28/10/2014

32. QoS Parameters for SPARQL Endpoints • Semantic Description of SPARQL Endpoint • QoS Profile of SPARQL Endpoints • QoS Profile 1. Endpoint SPARQL Endpoint 2. Default 3. User has QoSProfileEndpoint QoSProfile QoSProfileDefault QoSProfileUser Property SubClass 10 28/10/2014

33. QoS Parameters for SPARQL Endpoints hasValue QoSProfile QoSParame ter Name QoSWeight sameAs 11 28/10/2014 QoSCategory Value QoSMetric Tendency NonNumericMe tric ExactNumeric NumericMetric IntervalNumeric QoSUnit BooleanMetric LinguisticMe tric GradingMetric No Low Mid High Numeric Value TextValue Yes hasvalue hasvalue hasvalue hasvalue hasvalue start end hasName hasTendency hasCategory contains hasWeight hasMetric isMeasuredIn

46. Evaluation of the QoS Parameters • Performance • Response Time Q1 . SELECT ?p where { <s> ?p <o> } Q2 . SELECT ?o where { s1 p1 ?o s2 p2 ?o } 13 28/10/2014

47. Evaluation of the QoS Parameters • Performance • Response Time • Execution Time Q1 . SELECT ?p where { <s> ?p <o> } Q2 . SELECT ?o where { s1 p1 ?o s2 p2 ?o } Q3 . SELECT * where { ? s ?p ?o } LIMIT 1000 14 28/10/2014

48. Evaluation of the QoS Parameters • Performance • Response Time • Execution Time • Throughput Repeated execution of Q1. 15 28/10/2014

49. Evaluation of the QoS Parameters • Performance • Response Time • Execution Time • Throughput • Error Rate By putting the counter on the error returns by the SPARQL endpoint during the execution of the queries 16 28/10/2014

50. Evaluation of the QoS Parameters • Interoperabiilty • SPARQL Version • Additional Features • Restricted Features • SPARQL 1.1 test data set 17 28/10/2014

51. Evaluation of the QoS Parameters • Availability • UpTime • DownTime • MeanUpTime • MTTR • We rely on the service provider for the provision of initial UpTime. • Periodic execution of query Q1 to monitor availability • Started the counter of DownTime whenever Q1 failed • MeanUpTime calculated as percentage of the time SPARQL endpoint was available since its initial UpTime. • Mean Time To Recover (MTTR) is calculated as average time taken by SPARQL endpoint to recover after failure. 18 28/10/2014

52. Evaluation of the QoS Parameters • Licensing • PDDL • ODC-By • ODC-ODbL • CC0 0.1 Universal Q6 . PREFIX dcterms : <http://purl.org/dc/terms/> SELECT ?license WHERE { ?ds a void:Dataset . ?ds dcterms:license ?license . } 19 28/10/2014

53. Evaluation of the QoS Parameters • Dataset Description • Vocabulary for Interlinking Datasets • Service Description Q4 . PREFIX void : <http://rdfs.org/ns/void#> SELECT ?ds WHERE { ?ds a void:Dataset . ?ds void:SPARQLEndpoint <SPRQLEnpointURI> } 20 28/10/2014

54. Evaluation of the QoS Parameters • ResultSet • Size Limit • Result Format Q5 . PREFIX sd: <http://www.w3.org/ns/sparql-service-description#> SELECT ?format WHERE { ?s a sd:service . ?s sd:endpoint <endpointURI> . ?s sd:resultFormat ?format . } 21 28/10/2014

55. Evaluation of the QoS Parameters • Data Quality • Accuracy • Data Consistency • Completeness • Freshness Data quality is an overlap between quality of information(QoI) and quality of service(QoS) 22 28/10/2014

57. Monitoring QoS metrics of SPARQL endpoints • Montioring • Runtime • Periodic 24 28/10/2014

58. Monitoring QoS metrics of SPARQL endpoints • Montioring • Runtime • Periodic • QoS Profile Generation 24 28/10/2014

59. Federated SPARQL Queries • SPARQL 1.1 extension provides SERVICE keyword • Allows remotely execution of the SPARQL queries on several endpoints Federated SPARQL Query Engine Source Selection Indexing/Caching Optimiser Query Execution SPARQL SPARQL SPARQL Endpoint Endpoint Endpoint … SPARQL Endpoint SPARQL 25 28/10/2014

60. Federated SPARQL Queries • Problem of data source selection • Automated discovery and execution of the SPARQL endpoints for any federated query. Federated SPARQL Query Engine Source Selection Indexing/Caching Optimiser Query Execution SPARQL SPARQL SPARQL Endpoint Endpoint Endpoint … SPARQL Endpoint SPARQL 26 28/10/2014

61. Federated SPARQL Queries • Problem of data source selection • Automated discovery and execution of the SPARQL endpoints for any federated query. • Candidate Data Sources: “Given a user's query Q and set of n data sources DS = { dsi | i =1..n} , we define set of candidate data sources as DSc = { dscj | j = 1..m } that can potentially contribute to answer query Q, where DSc ⊆ DS and 1 ≤ m ≤ n . “ 27 28/10/2014

62. Federated SPARQL Queries • Problem of data source selection • Automated discovery and execution of the SPARQL endpoints for any federated query. • QoS Aware Data Sources: “Given a set of candidate data sources DSc, we define set of QoS aware data sources as DSqos = { dsqosk | k = 1..l } as set of optimal data sources that can potentially contribute to the answer of the Query Q and are compliant with the QoS requirements mentioned in the query, where DSqos ⊆ DSc and 1 ≤ l ≤ m ≤ n . “ 28 28/10/2014

63. QoS-Aware Federated SPARQL Queries Federated SPARQL Query Engine QoS Aware Query Parser Source Selection Indexing/Caching QoS Aware Source Selection QoS Repository QoS Evaluation 29 28/10/2014 Optimiser Query Execution SPARQL SPARQL SPARQL Endpoint Endpoint Endpoint … SPARQL Endpoint User Query & QoS SPARQL DSc DSqos QoSProfile Monitoring

66. SPARQL Extension with QoS • QoS requirements can be described as part of the SPARQL query • We introduce a new QOSREQ keyword in the SPARQL query language • QOSREQ operator is applied to the triple pattern or BGP immediarly proceeding the operator • Comma separated values of multiple QoS parameters within QOSREQ operator • Comparison operators to compare the user defined QoS requirements with QoS profile of the SPARQL endpoint 30 28/10/2014

67. SPARQL Extension with QoS • QoS requirements can be described as part of the SPARQL query SELECT ?drug ?keggUrl ?chebiImage WHERE { ?drug rdf:type drugbank:drugs . QOSREQ[ qs:ResponseTime < 10 , qs:SizeLimit > 10000] ?drug drugbank:keggCompoundId ?keggDrug . ?keggDrug bio2rdf:u r l ?keggUrl . { ?drug drugbank:genericName ?drugBankName . ?chebiDrug purl:title ?drugBankName . } QOSREQ[ qs:DatasetDescription = 'VoID' , qs:MeanUpTime > 80 ] ?chebiDrug chebi:image ?chebiImage . } 30 28/10/2014

75. Experimental Evaluation • FedBench Benchmark • A benchmark suite for federated SPARQL queries evaluation • Provides various data sets from Life Sciences, Linked Data and Cross Domains • 25 queries to evaluate the performance • Testbed • Datasets are deployed as SPARQL endpoints • Multiple Copies of the data sets to create higher number of candidate data sources • Human intervention to create fluctuation • Montioring of the SPARQL endpoints for more than 2 months • QoS Profiles generation and updates in QoS metrics values based on continuous monitoring 32 28/10/2014

76. 33 28/10/2014 Experimental Evaluation 0" 5" 10" 15" 20" CD1" CD2" CD3" CD4" CD5" CD6" CD7" LS1" LS2" LS3" LS4" LS5" LS6" LS7" LD1" LD2" LD3" LD4" LD5" LD6" LD7" LD8" LD9" LD10" LD11" No."of"Selected"Data"Sources" FedX?QBenchmark" Splendind?QBenchmark" QoSAware?QDefault" QoSAware?Quser"

77. 34 28/10/2014 Experimental Evaluation 0" 5" 10" 15" CD1" CD2" CD3" CD4" CD5" CD6" CD7" LS1" LS2" LS3" LS4" LS5" LS6" LS7" LD1" LD2" LD3" LD4" LD5" LD6" LD7" LD8" LD9" LD10" LD11" No."of"Selected"Data"Sources" Q_50" Q_75" Q_100"

78. Conclusion • Identification and semantic representation of the QoS parameters of the SPARQL endpoints • QoS metrics evaluation mechanism • A monitoring Service for QoS Evaluation • SPARQL extension for users QoS requirements within query language • QoS-Aware Federated SPARQL query evaluation 35 28/10/2014

79. Future Work • QoS monitoring over public SPARQL endpoints & integration with SPARQLES • Sophisticated mechanisms for Quality of Information evaluation • Taking QoS requirements as well preferences into account (Hard and Soft Constraints) • QoS aggregated values 35 28/10/2014

How good is your SPARQL endpoint? A QoS-Aware SPARQL Endpoint Monitoring and Data Source Selection Mechanism for Federated SPARQL Queries

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to How good is your SPARQL endpoint? A QoS-Aware SPARQL Endpoint Monitoring and Data Source Selection Mechanism for Federated SPARQL Queries

Similar to How good is your SPARQL endpoint? A QoS-Aware SPARQL Endpoint Monitoring and Data Source Selection Mechanism for Federated SPARQL Queries (20)

Recently uploaded

Recently uploaded (20)

How good is your SPARQL endpoint? A QoS-Aware SPARQL Endpoint Monitoring and Data Source Selection Mechanism for Federated SPARQL Queries