SlideShare a Scribd company logo
Linked Data Query Processing
Tutorial at the 22nd International World Wide Web Conference (WWW 2013)
May 14, 2013
http://db.uwaterloo.ca/LDQTut2013/
3. Source Selection
Olaf Hartig
University of Waterloo
WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 2
● Result construction approach
● i.e., query-local data processing
http://mdb.../Paul http://geo.../Berlin
http://mdb.../Ric http://geo.../Rome
?loc?actor
● Combining data retrieval
and result construction
● Data retrieval approach
● Data source selection
● Data source ranking
(optional, for optimization)
GET http://.../movie2449
“Ingredients” for LD Query Execution
Query-local data
WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 3
Query-Specific Relevance of URIs
● Definition: A URI is relevant for a given query if looking up
this URI gives us data that contributes to the query result.
● Example:
● Conjunctive query (BGP): { (Bob, lives in, ?x) , (?y, lives in, ?x) }
● Looking up URI Bob gives us: { (Bob, lives in, Berlin) , ... }
● Looking up URI Alice gives us: { (Alice, lives in, Berlin) , ... }
● Hence, μ = { ?x → Berlin , ?y → Alice } is a solution
● Thus, URIs Bob and Alice are relevant for the query
● Simply contributing a matching triple is not sufficient:
● Suppose, URI Charles gives us { (Charles, lives in, London) , ... }
● Since the matching triple cannot be used for computing
a solution, URI Charles is not relevant.
WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 4
Objective of Source Selection
● Source selection: Given a Linked Data query,
determine a set of URIs to look up
● Ideal source selection approach:
● For any query, selects all relevant URIs
● For any query, selects relevant URIs only
● Irrelevant URIs are not required to answer the query
● Avoiding their lookup reduces cost of query executions
significantly!
● Caveat:
● What URIs are relevant (resp. irrelevant) is unknown
before the query execution has been completed.
WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 5
Outline
 Objectives of Source Selection
 Index-Based Strategy
➢ General Idea
➢ Possible Index Structures
 Live Exploration Strategy
 Comparison of both Strategies
 Combining both Strategies
√
WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 6
Idea of Index-Based Source Selection
● Use a pre-populated index structure to determine relevant
URIs (and to avoid as many irrelevant ones as possible)
● Example: triple-pattern-based indexes
● For single triple pattern queries, source
selection using such an index structure is
sound and complete (w.r.t. the indexed URIs)
Entry: { uri1, uri2, … , urin }Key: tp
GET urii
matches
WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 7
General Properties of Lookup Indexes
● Index entries:
● Usually, a set of URIs
● Each URI in such an entry may be paired
with a cardinality (utilized for source ranking)
● Indexed URIs may appear multiple times
(i.e., associated with multiple index keys)
● Type of index keys depends on the
particular index structure used
● e.g., triple patterns
● Represent a summary of the data from all indexed URIs
● Perfect summary: index keys are individual elements
● Approximate summary: index keys may range over elements
WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 8
Perfect Summaries
● Triple-pattern-based indexes
● “Inverted URI Indexing” [UHK+11]
● “Schema-level Indexing” [UHK+11]
● Index keys: schema elements
● Like a triple-pattern-based index that considers only two types
of triple patterns: ( ?s, property, ?o ) and ( ?s, rdf:type, class )
● Tian et al. [TUY11]
● Index keys: Unique encodings of combinations of triple
patterns (i.e., BGPs) frequently found in a query workload
Key: uri
mentioned in
Entry: { uri1, … , urin } GET urii
WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 9
Approximate Summaries
● Recall, index keys may range over elements
● Advantage: approximation reduces index size
● Disadvantage: index lookup may return false positives
● Examples of data structures used:
● Multidimensional histogram [UHK+11]
● QTree [HHK+10, UHK+11]
WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 10
Multidimensional Histograms
● Transform RDF triples to points in a 3-dimensional space
(Bob, lives in, Berlin) → hash function → (422, 247, 143)
WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 11
Multidimensional Histograms
● Transform RDF triples to points in a 3-dimensional space
(Bob, lives in, Berlin) → hash function → (422, 247, 143)
● Buckets partition that space into disjoint regions
● Indexing: Each bucket contains entries for all URIs whose
data includes an RDF triple in the corresponding region
● Source selection:
● Transform triple patterns to lines / planes in the space
(Bob, lives in, ?x) → (422, 247, ?)
● Any URI relevant for the triple pattern
may only be contained in buckets whose
region is touched by the line / plane
● Pruning due to non-overlapping regions
WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 12
Root
QTree
● Combination of histograms and R-trees (i.e., hierarchical)
● Leaf nodes are the buckets
● Different buckets may
represent regions of
different size
(in contrast to fixed-sized
regions used for MDH)
● Non-populated regions
are ignored
● Deals more efficiently with a space
that is populated sparsely or
contains many clusters
B
C
AA1
A2
Root
A B
A1 A2
C
B1 B2
B2
B1
WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 13
Index Construction
● Given a set of URIs to index, each of these URIs needs to
be looked up and its data needs to be retrieved
● Alternative: crawl the Web to obtain URIs and their data
● Alternative: populate index as a by-product of executing
queries using live-exploration-based source selection
WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 14
Index Maintenance
● Adding additionally discovered URIs
● Keeping the index in sync with original data
● Still an open research problem
● Similar to index maintenance in
information retrieval and
view maintenance in
database systems
WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 15
Outline
 Objectives of Source Selection
 Index-Based Strategy
➢ General Idea
➢ Possible Index Structures
 Live Exploration Strategy
 Comparison of both Strategies
 Combining both Strategies
√
√
WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 16
Live Exploration
● General idea: Perform a recursive URI lookup process
at query execution runtime
● Start from a set of seed URIs
● Explore the queried Web by traversing data links
● Retrieved data serves two purposes:
(1) Discover further URIs
(2) Construct query result
● Lookup of URIs may be constrained
(i.e., not all links need be traversed)
● Natural support of reachability-based query semantics
WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 17
Comparison to Focused Crawling
● Separate pre-runtime (or
background) process
● Crawler populates
a search index or
a local database
● Essential part of the query
execution process itself
● Live exploration aims
to discover data for
answering a particular
query
● URIs qualify for lookup
because of their high
relevance for a topic
● Relevance of URIs
related to the query
at hand
Focused Crawling vs. Live Exploration
WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 18
Outline
 Objectives of Source Selection
 Index-Based Strategy
➢ General Idea
➢ Possible Index Structures
 Live Exploration Strategy
 Comparison of both Strategies
 Combining both Strategies
√
√
√
WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 19
Live Exploration – vs. – Index-Based
● Possibilities for parallelized
data retrieval are limited
● Data retrieval adds to query
execution time significantly
● Usable immediately
● Most suitable for “on-
demand” querying scenario
● Depends on the structure
of the network of data links
● Data retrieval can be fully
parallelized
● Reduces the impact of data
retrieval on query exec. time
● Usable only after
initialization phase
● Depends on what has been
selected for the index
● May miss new data sources
None of both strategies is superior over the other w.r.t.
result completeness (under full-Web query semantics).
● Both strategies may miss (different) solutions for a query
WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 20
Hybrid Source Selection
Why not get the best of both strategies by combining them?
● Ideas:
● Use index to obtain seed URIs for live exploration
(e.g., “mixed strategy” [LT10])
● Feed back information discovered by live exploration
to update, to expand, or to reorganize the index
● Use data summary for controlling a live exploration process
(e.g., by prioritizing the URIs scheduled for lookup)
WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 21
Outline
 Objectives of Source Selection
 Index-Based Strategy
➢ General Idea
➢ Possible Index Structures
 Live Exploration Strategy
 Comparison of both Strategies
 Combining both Strategies
√
√
√
√
√
Next part: 4. Execution Process ...
WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 22
These slides have been created by
Olaf Hartig
for the
WWW 2013 tutorial on
Link Data Query Processing
Tutorial Website: http://db.uwaterloo.ca/LDQTut2013/
This work is licensed under a
Creative Commons Attribution-Share Alike 3.0 License
(http://creativecommons.org/licenses/by-sa/3.0/)
(Slides 10,11, and 12 are inspired by slides
from Andreas Harth [HHK+10] – Thanks!)

More Related Content

What's hot

Interactive exploration of complex relational data sets in a web - SemWeb.Pro...
Interactive exploration of complex relational data sets in a web - SemWeb.Pro...Interactive exploration of complex relational data sets in a web - SemWeb.Pro...
Interactive exploration of complex relational data sets in a web - SemWeb.Pro...Logilab
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudOntotext
 
eNanoMapper database, search tools and templates
eNanoMapper database, search tools and templateseNanoMapper database, search tools and templates
eNanoMapper database, search tools and templatesNina Jeliazkova
 
Detection of Related Semantic Datasets Based on Frequent Subgraph Mining
Detection of Related Semantic Datasets Based on Frequent Subgraph MiningDetection of Related Semantic Datasets Based on Frequent Subgraph Mining
Detection of Related Semantic Datasets Based on Frequent Subgraph MiningMikel Emaldi Manrique
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataGiorgos Santipantakis
 
2015 TaPP - Interoperability for Provenance-aware Databases using PROV and JSON
2015 TaPP - Interoperability for Provenance-aware Databases using PROV and JSON2015 TaPP - Interoperability for Provenance-aware Databases using PROV and JSON
2015 TaPP - Interoperability for Provenance-aware Databases using PROV and JSONBoris Glavic
 
Adventures in Linked Data Land (presentation by Richard Light)
Adventures in Linked Data Land (presentation by Richard Light)Adventures in Linked Data Land (presentation by Richard Light)
Adventures in Linked Data Land (presentation by Richard Light)jottevanger
 
euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)Besnik Fetahu
 
Interaction with Linked Data
Interaction with Linked DataInteraction with Linked Data
Interaction with Linked DataEUCLID project
 
Building Linked Data Applications
Building Linked Data ApplicationsBuilding Linked Data Applications
Building Linked Data ApplicationsEUCLID project
 
SF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in PythonSF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in PythonPaco Nathan
 
Big Linked Data - Creating Training Curricula
Big Linked Data - Creating Training CurriculaBig Linked Data - Creating Training Curricula
Big Linked Data - Creating Training CurriculaEUCLID project
 
A Model of the Scholarly Community
A Model of the Scholarly CommunityA Model of the Scholarly Community
A Model of the Scholarly CommunityMarko Rodriguez
 
Computing with Directed Labeled Graphs
Computing with Directed Labeled GraphsComputing with Directed Labeled Graphs
Computing with Directed Labeled GraphsMarko Rodriguez
 
Automatic Metadata Generation using Associative Networks
Automatic Metadata Generation using Associative NetworksAutomatic Metadata Generation using Associative Networks
Automatic Metadata Generation using Associative NetworksMarko Rodriguez
 

What's hot (20)

Interactive exploration of complex relational data sets in a web - SemWeb.Pro...
Interactive exploration of complex relational data sets in a web - SemWeb.Pro...Interactive exploration of complex relational data sets in a web - SemWeb.Pro...
Interactive exploration of complex relational data sets in a web - SemWeb.Pro...
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
 
eNanoMapper database, search tools and templates
eNanoMapper database, search tools and templateseNanoMapper database, search tools and templates
eNanoMapper database, search tools and templates
 
Detection of Related Semantic Datasets Based on Frequent Subgraph Mining
Detection of Related Semantic Datasets Based on Frequent Subgraph MiningDetection of Related Semantic Datasets Based on Frequent Subgraph Mining
Detection of Related Semantic Datasets Based on Frequent Subgraph Mining
 
Streams&io
Streams&ioStreams&io
Streams&io
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
 
2015 TaPP - Interoperability for Provenance-aware Databases using PROV and JSON
2015 TaPP - Interoperability for Provenance-aware Databases using PROV and JSON2015 TaPP - Interoperability for Provenance-aware Databases using PROV and JSON
2015 TaPP - Interoperability for Provenance-aware Databases using PROV and JSON
 
Providing Linked Data
Providing Linked DataProviding Linked Data
Providing Linked Data
 
Adventures in Linked Data Land (presentation by Richard Light)
Adventures in Linked Data Land (presentation by Richard Light)Adventures in Linked Data Land (presentation by Richard Light)
Adventures in Linked Data Land (presentation by Richard Light)
 
Oshs_9_11_2015
Oshs_9_11_2015Oshs_9_11_2015
Oshs_9_11_2015
 
Querying Linked Data
Querying Linked DataQuerying Linked Data
Querying Linked Data
 
euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)
 
Interaction with Linked Data
Interaction with Linked DataInteraction with Linked Data
Interaction with Linked Data
 
Building Linked Data Applications
Building Linked Data ApplicationsBuilding Linked Data Applications
Building Linked Data Applications
 
Graph databases
Graph databasesGraph databases
Graph databases
 
SF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in PythonSF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in Python
 
Big Linked Data - Creating Training Curricula
Big Linked Data - Creating Training CurriculaBig Linked Data - Creating Training Curricula
Big Linked Data - Creating Training Curricula
 
A Model of the Scholarly Community
A Model of the Scholarly CommunityA Model of the Scholarly Community
A Model of the Scholarly Community
 
Computing with Directed Labeled Graphs
Computing with Directed Labeled GraphsComputing with Directed Labeled Graphs
Computing with Directed Labeled Graphs
 
Automatic Metadata Generation using Associative Networks
Automatic Metadata Generation using Associative NetworksAutomatic Metadata Generation using Associative Networks
Automatic Metadata Generation using Associative Networks
 

Similar to Tutorial "Linked Data Query Processing" Part 3 "Source Selection Strategies" (WWW 2013 Ed.)

Sekhon final 1_ppt
Sekhon final 1_pptSekhon final 1_ppt
Sekhon final 1_pptManant Sweet
 
SEMLIB Final Conference | DERI presentation
SEMLIB Final Conference | DERI presentationSEMLIB Final Conference | DERI presentation
SEMLIB Final Conference | DERI presentationSemLib Project
 
A candidate dataset_discovery_and_linkage_recommendation_system_for_linked_data
A candidate dataset_discovery_and_linkage_recommendation_system_for_linked_dataA candidate dataset_discovery_and_linkage_recommendation_system_for_linked_data
A candidate dataset_discovery_and_linkage_recommendation_system_for_linked_dataSTIinnsbruck
 
Comparative analysis of relative and exact search for web information retrieval
Comparative analysis of relative and exact search for web information retrievalComparative analysis of relative and exact search for web information retrieval
Comparative analysis of relative and exact search for web information retrievaleSAT Journals
 
WEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEMWEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEMSai Kumar Ale
 
dipLODocus[RDF]: Short and Long-Tail RDF Analytics for Massive Webs of Data
dipLODocus[RDF]: Short and Long-Tail RDF Analytics for Massive Webs of DatadipLODocus[RDF]: Short and Long-Tail RDF Analytics for Massive Webs of Data
dipLODocus[RDF]: Short and Long-Tail RDF Analytics for Massive Webs of DataeXascale Infolab
 
IRJET- Text-based Domain and Image Categorization of Google Search Engine usi...
IRJET- Text-based Domain and Image Categorization of Google Search Engine usi...IRJET- Text-based Domain and Image Categorization of Google Search Engine usi...
IRJET- Text-based Domain and Image Categorization of Google Search Engine usi...IRJET Journal
 
Recommendation engines
Recommendation enginesRecommendation engines
Recommendation enginesGeorgian Micsa
 
Cold-Start Management with Cross-Domain Collaborative Filtering and Tags
Cold-Start Management with Cross-Domain Collaborative Filtering and TagsCold-Start Management with Cross-Domain Collaborative Filtering and Tags
Cold-Start Management with Cross-Domain Collaborative Filtering and TagsMatthias Braunhofer
 
Data mining and warehouse by dr D. R. Patil sir
Data mining and warehouse by dr D. R. Patil sirData mining and warehouse by dr D. R. Patil sir
Data mining and warehouse by dr D. R. Patil sirchaudharipruthvirajr
 
Quick overview on mongo db
Quick overview on mongo dbQuick overview on mongo db
Quick overview on mongo dbEman Mohamed
 
Becoming "Facet"-nated with Search API
Becoming "Facet"-nated with Search APIBecoming "Facet"-nated with Search API
Becoming "Facet"-nated with Search APIcgmonroe
 
Web clustering engines
Web clustering enginesWeb clustering engines
Web clustering enginesYash Darak
 
Statistical Databases
Statistical DatabasesStatistical Databases
Statistical Databasesssuseraef7e0
 
A Novel Data Extraction and Alignment Method for Web Databases
A Novel Data Extraction and Alignment Method for Web DatabasesA Novel Data Extraction and Alignment Method for Web Databases
A Novel Data Extraction and Alignment Method for Web DatabasesIJMER
 
web clustering engines
web clustering enginesweb clustering engines
web clustering enginesArun TR
 
Data Mining Module 5 Business Analytics.pdf
Data Mining Module 5 Business Analytics.pdfData Mining Module 5 Business Analytics.pdf
Data Mining Module 5 Business Analytics.pdfJayanti Pande
 
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...Yongyao Jiang
 
Data Analytics.01. Data selection and capture
Data Analytics.01. Data selection and captureData Analytics.01. Data selection and capture
Data Analytics.01. Data selection and captureAlex Rayón Jerez
 

Similar to Tutorial "Linked Data Query Processing" Part 3 "Source Selection Strategies" (WWW 2013 Ed.) (20)

Sekhon final 1_ppt
Sekhon final 1_pptSekhon final 1_ppt
Sekhon final 1_ppt
 
Sebastian Hellmann
Sebastian HellmannSebastian Hellmann
Sebastian Hellmann
 
SEMLIB Final Conference | DERI presentation
SEMLIB Final Conference | DERI presentationSEMLIB Final Conference | DERI presentation
SEMLIB Final Conference | DERI presentation
 
A candidate dataset_discovery_and_linkage_recommendation_system_for_linked_data
A candidate dataset_discovery_and_linkage_recommendation_system_for_linked_dataA candidate dataset_discovery_and_linkage_recommendation_system_for_linked_data
A candidate dataset_discovery_and_linkage_recommendation_system_for_linked_data
 
Comparative analysis of relative and exact search for web information retrieval
Comparative analysis of relative and exact search for web information retrievalComparative analysis of relative and exact search for web information retrieval
Comparative analysis of relative and exact search for web information retrieval
 
WEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEMWEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEM
 
dipLODocus[RDF]: Short and Long-Tail RDF Analytics for Massive Webs of Data
dipLODocus[RDF]: Short and Long-Tail RDF Analytics for Massive Webs of DatadipLODocus[RDF]: Short and Long-Tail RDF Analytics for Massive Webs of Data
dipLODocus[RDF]: Short and Long-Tail RDF Analytics for Massive Webs of Data
 
IRJET- Text-based Domain and Image Categorization of Google Search Engine usi...
IRJET- Text-based Domain and Image Categorization of Google Search Engine usi...IRJET- Text-based Domain and Image Categorization of Google Search Engine usi...
IRJET- Text-based Domain and Image Categorization of Google Search Engine usi...
 
Recommendation engines
Recommendation enginesRecommendation engines
Recommendation engines
 
Cold-Start Management with Cross-Domain Collaborative Filtering and Tags
Cold-Start Management with Cross-Domain Collaborative Filtering and TagsCold-Start Management with Cross-Domain Collaborative Filtering and Tags
Cold-Start Management with Cross-Domain Collaborative Filtering and Tags
 
Data mining and warehouse by dr D. R. Patil sir
Data mining and warehouse by dr D. R. Patil sirData mining and warehouse by dr D. R. Patil sir
Data mining and warehouse by dr D. R. Patil sir
 
Quick overview on mongo db
Quick overview on mongo dbQuick overview on mongo db
Quick overview on mongo db
 
Becoming "Facet"-nated with Search API
Becoming "Facet"-nated with Search APIBecoming "Facet"-nated with Search API
Becoming "Facet"-nated with Search API
 
Web clustering engines
Web clustering enginesWeb clustering engines
Web clustering engines
 
Statistical Databases
Statistical DatabasesStatistical Databases
Statistical Databases
 
A Novel Data Extraction and Alignment Method for Web Databases
A Novel Data Extraction and Alignment Method for Web DatabasesA Novel Data Extraction and Alignment Method for Web Databases
A Novel Data Extraction and Alignment Method for Web Databases
 
web clustering engines
web clustering enginesweb clustering engines
web clustering engines
 
Data Mining Module 5 Business Analytics.pdf
Data Mining Module 5 Business Analytics.pdfData Mining Module 5 Business Analytics.pdf
Data Mining Module 5 Business Analytics.pdf
 
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
 
Data Analytics.01. Data selection and capture
Data Analytics.01. Data selection and captureData Analytics.01. Data selection and capture
Data Analytics.01. Data selection and capture
 

More from Olaf Hartig

A Context-Based Semantics for SPARQL Property Paths over the Web
A Context-Based Semantics for SPARQL Property Paths over the WebA Context-Based Semantics for SPARQL Property Paths over the Web
A Context-Based Semantics for SPARQL Property Paths over the WebOlaf Hartig
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...Olaf Hartig
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 2 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 2 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 2 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 2 (...Olaf Hartig
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...Olaf Hartig
 
An Overview on PROV-AQ: Provenance Access and Query
An Overview on PROV-AQ: Provenance Access and QueryAn Overview on PROV-AQ: Provenance Access and Query
An Overview on PROV-AQ: Provenance Access and QueryOlaf Hartig
 
(An Overview on) Linked Data Management and SPARQL Querying (ISSLOD2011)
(An Overview on) Linked Data Management and SPARQL Querying (ISSLOD2011)(An Overview on) Linked Data Management and SPARQL Querying (ISSLOD2011)
(An Overview on) Linked Data Management and SPARQL Querying (ISSLOD2011)Olaf Hartig
 
Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversa...
Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversa...Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversa...
Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversa...Olaf Hartig
 
The Impact of Data Caching of on Query Execution for Linked Data
The Impact of Data Caching of on Query Execution for Linked DataThe Impact of Data Caching of on Query Execution for Linked Data
The Impact of Data Caching of on Query Execution for Linked DataOlaf Hartig
 
How Caching Improves Efficiency and Result Completeness for Querying Linked Data
How Caching Improves Efficiency and Result Completeness for Querying Linked DataHow Caching Improves Efficiency and Result Completeness for Querying Linked Data
How Caching Improves Efficiency and Result Completeness for Querying Linked DataOlaf Hartig
 
A Main Memory Index Structure to Query Linked Data
A Main Memory Index Structure to Query Linked DataA Main Memory Index Structure to Query Linked Data
A Main Memory Index Structure to Query Linked DataOlaf Hartig
 
Towards a Data-Centric Notion of Trust in the Semantic Web (A Position Statem...
Towards a Data-Centric Notion of Trust in the Semantic Web (A Position Statem...Towards a Data-Centric Notion of Trust in the Semantic Web (A Position Statem...
Towards a Data-Centric Notion of Trust in the Semantic Web (A Position Statem...Olaf Hartig
 
Brief Introduction to the Provenance Vocabulary (for W3C prov-xg)
Brief Introduction to the Provenance Vocabulary (for W3C prov-xg)Brief Introduction to the Provenance Vocabulary (for W3C prov-xg)
Brief Introduction to the Provenance Vocabulary (for W3C prov-xg)Olaf Hartig
 
Querying Linked Data with SPARQL (2010)
Querying Linked Data with SPARQL (2010)Querying Linked Data with SPARQL (2010)
Querying Linked Data with SPARQL (2010)Olaf Hartig
 
Answers to usual issues in getting started with consuming Linked Data (2010)
Answers to usual issues in getting started with consuming Linked Data (2010)Answers to usual issues in getting started with consuming Linked Data (2010)
Answers to usual issues in getting started with consuming Linked Data (2010)Olaf Hartig
 
Linked Data on the Web
Linked Data on the WebLinked Data on the Web
Linked Data on the WebOlaf Hartig
 
Executing SPARQL Queries of the Web of Linked Data
Executing SPARQL Queries of the Web of Linked DataExecuting SPARQL Queries of the Web of Linked Data
Executing SPARQL Queries of the Web of Linked DataOlaf Hartig
 
Using Web Data Provenance for Quality Assessment
Using Web Data Provenance for Quality AssessmentUsing Web Data Provenance for Quality Assessment
Using Web Data Provenance for Quality AssessmentOlaf Hartig
 
Answers to usual issues in getting started with consuming Linked Data
Answers to usual issues in getting started with consuming Linked DataAnswers to usual issues in getting started with consuming Linked Data
Answers to usual issues in getting started with consuming Linked DataOlaf Hartig
 
Querying Linked Data with SPARQL
Querying Linked Data with SPARQLQuerying Linked Data with SPARQL
Querying Linked Data with SPARQLOlaf Hartig
 
Querying Trust in RDF Data with tSPARQL
Querying Trust in RDF Data with tSPARQLQuerying Trust in RDF Data with tSPARQL
Querying Trust in RDF Data with tSPARQLOlaf Hartig
 

More from Olaf Hartig (20)

A Context-Based Semantics for SPARQL Property Paths over the Web
A Context-Based Semantics for SPARQL Property Paths over the WebA Context-Based Semantics for SPARQL Property Paths over the Web
A Context-Based Semantics for SPARQL Property Paths over the Web
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 2 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 2 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 2 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 2 (...
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...
 
An Overview on PROV-AQ: Provenance Access and Query
An Overview on PROV-AQ: Provenance Access and QueryAn Overview on PROV-AQ: Provenance Access and Query
An Overview on PROV-AQ: Provenance Access and Query
 
(An Overview on) Linked Data Management and SPARQL Querying (ISSLOD2011)
(An Overview on) Linked Data Management and SPARQL Querying (ISSLOD2011)(An Overview on) Linked Data Management and SPARQL Querying (ISSLOD2011)
(An Overview on) Linked Data Management and SPARQL Querying (ISSLOD2011)
 
Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversa...
Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversa...Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversa...
Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversa...
 
The Impact of Data Caching of on Query Execution for Linked Data
The Impact of Data Caching of on Query Execution for Linked DataThe Impact of Data Caching of on Query Execution for Linked Data
The Impact of Data Caching of on Query Execution for Linked Data
 
How Caching Improves Efficiency and Result Completeness for Querying Linked Data
How Caching Improves Efficiency and Result Completeness for Querying Linked DataHow Caching Improves Efficiency and Result Completeness for Querying Linked Data
How Caching Improves Efficiency and Result Completeness for Querying Linked Data
 
A Main Memory Index Structure to Query Linked Data
A Main Memory Index Structure to Query Linked DataA Main Memory Index Structure to Query Linked Data
A Main Memory Index Structure to Query Linked Data
 
Towards a Data-Centric Notion of Trust in the Semantic Web (A Position Statem...
Towards a Data-Centric Notion of Trust in the Semantic Web (A Position Statem...Towards a Data-Centric Notion of Trust in the Semantic Web (A Position Statem...
Towards a Data-Centric Notion of Trust in the Semantic Web (A Position Statem...
 
Brief Introduction to the Provenance Vocabulary (for W3C prov-xg)
Brief Introduction to the Provenance Vocabulary (for W3C prov-xg)Brief Introduction to the Provenance Vocabulary (for W3C prov-xg)
Brief Introduction to the Provenance Vocabulary (for W3C prov-xg)
 
Querying Linked Data with SPARQL (2010)
Querying Linked Data with SPARQL (2010)Querying Linked Data with SPARQL (2010)
Querying Linked Data with SPARQL (2010)
 
Answers to usual issues in getting started with consuming Linked Data (2010)
Answers to usual issues in getting started with consuming Linked Data (2010)Answers to usual issues in getting started with consuming Linked Data (2010)
Answers to usual issues in getting started with consuming Linked Data (2010)
 
Linked Data on the Web
Linked Data on the WebLinked Data on the Web
Linked Data on the Web
 
Executing SPARQL Queries of the Web of Linked Data
Executing SPARQL Queries of the Web of Linked DataExecuting SPARQL Queries of the Web of Linked Data
Executing SPARQL Queries of the Web of Linked Data
 
Using Web Data Provenance for Quality Assessment
Using Web Data Provenance for Quality AssessmentUsing Web Data Provenance for Quality Assessment
Using Web Data Provenance for Quality Assessment
 
Answers to usual issues in getting started with consuming Linked Data
Answers to usual issues in getting started with consuming Linked DataAnswers to usual issues in getting started with consuming Linked Data
Answers to usual issues in getting started with consuming Linked Data
 
Querying Linked Data with SPARQL
Querying Linked Data with SPARQLQuerying Linked Data with SPARQL
Querying Linked Data with SPARQL
 
Querying Trust in RDF Data with tSPARQL
Querying Trust in RDF Data with tSPARQLQuerying Trust in RDF Data with tSPARQL
Querying Trust in RDF Data with tSPARQL
 

Recently uploaded

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...CzechDreamin
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxDavid Michel
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeCzechDreamin
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaCzechDreamin
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaRTTS
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Thierry Lestable
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCzechDreamin
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxAbida Shariff
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsExpeed Software
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationZilliz
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backElena Simperl
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Julian Hyde
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2DianaGray10
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Product School
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...Product School
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...Sri Ambati
 

Recently uploaded (20)

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT Professionals
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG Evaluation
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 

Tutorial "Linked Data Query Processing" Part 3 "Source Selection Strategies" (WWW 2013 Ed.)

  • 1. Linked Data Query Processing Tutorial at the 22nd International World Wide Web Conference (WWW 2013) May 14, 2013 http://db.uwaterloo.ca/LDQTut2013/ 3. Source Selection Olaf Hartig University of Waterloo
  • 2. WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 2 ● Result construction approach ● i.e., query-local data processing http://mdb.../Paul http://geo.../Berlin http://mdb.../Ric http://geo.../Rome ?loc?actor ● Combining data retrieval and result construction ● Data retrieval approach ● Data source selection ● Data source ranking (optional, for optimization) GET http://.../movie2449 “Ingredients” for LD Query Execution Query-local data
  • 3. WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 3 Query-Specific Relevance of URIs ● Definition: A URI is relevant for a given query if looking up this URI gives us data that contributes to the query result. ● Example: ● Conjunctive query (BGP): { (Bob, lives in, ?x) , (?y, lives in, ?x) } ● Looking up URI Bob gives us: { (Bob, lives in, Berlin) , ... } ● Looking up URI Alice gives us: { (Alice, lives in, Berlin) , ... } ● Hence, μ = { ?x → Berlin , ?y → Alice } is a solution ● Thus, URIs Bob and Alice are relevant for the query ● Simply contributing a matching triple is not sufficient: ● Suppose, URI Charles gives us { (Charles, lives in, London) , ... } ● Since the matching triple cannot be used for computing a solution, URI Charles is not relevant.
  • 4. WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 4 Objective of Source Selection ● Source selection: Given a Linked Data query, determine a set of URIs to look up ● Ideal source selection approach: ● For any query, selects all relevant URIs ● For any query, selects relevant URIs only ● Irrelevant URIs are not required to answer the query ● Avoiding their lookup reduces cost of query executions significantly! ● Caveat: ● What URIs are relevant (resp. irrelevant) is unknown before the query execution has been completed.
  • 5. WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 5 Outline  Objectives of Source Selection  Index-Based Strategy ➢ General Idea ➢ Possible Index Structures  Live Exploration Strategy  Comparison of both Strategies  Combining both Strategies √
  • 6. WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 6 Idea of Index-Based Source Selection ● Use a pre-populated index structure to determine relevant URIs (and to avoid as many irrelevant ones as possible) ● Example: triple-pattern-based indexes ● For single triple pattern queries, source selection using such an index structure is sound and complete (w.r.t. the indexed URIs) Entry: { uri1, uri2, … , urin }Key: tp GET urii matches
  • 7. WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 7 General Properties of Lookup Indexes ● Index entries: ● Usually, a set of URIs ● Each URI in such an entry may be paired with a cardinality (utilized for source ranking) ● Indexed URIs may appear multiple times (i.e., associated with multiple index keys) ● Type of index keys depends on the particular index structure used ● e.g., triple patterns ● Represent a summary of the data from all indexed URIs ● Perfect summary: index keys are individual elements ● Approximate summary: index keys may range over elements
  • 8. WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 8 Perfect Summaries ● Triple-pattern-based indexes ● “Inverted URI Indexing” [UHK+11] ● “Schema-level Indexing” [UHK+11] ● Index keys: schema elements ● Like a triple-pattern-based index that considers only two types of triple patterns: ( ?s, property, ?o ) and ( ?s, rdf:type, class ) ● Tian et al. [TUY11] ● Index keys: Unique encodings of combinations of triple patterns (i.e., BGPs) frequently found in a query workload Key: uri mentioned in Entry: { uri1, … , urin } GET urii
  • 9. WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 9 Approximate Summaries ● Recall, index keys may range over elements ● Advantage: approximation reduces index size ● Disadvantage: index lookup may return false positives ● Examples of data structures used: ● Multidimensional histogram [UHK+11] ● QTree [HHK+10, UHK+11]
  • 10. WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 10 Multidimensional Histograms ● Transform RDF triples to points in a 3-dimensional space (Bob, lives in, Berlin) → hash function → (422, 247, 143)
  • 11. WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 11 Multidimensional Histograms ● Transform RDF triples to points in a 3-dimensional space (Bob, lives in, Berlin) → hash function → (422, 247, 143) ● Buckets partition that space into disjoint regions ● Indexing: Each bucket contains entries for all URIs whose data includes an RDF triple in the corresponding region ● Source selection: ● Transform triple patterns to lines / planes in the space (Bob, lives in, ?x) → (422, 247, ?) ● Any URI relevant for the triple pattern may only be contained in buckets whose region is touched by the line / plane ● Pruning due to non-overlapping regions
  • 12. WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 12 Root QTree ● Combination of histograms and R-trees (i.e., hierarchical) ● Leaf nodes are the buckets ● Different buckets may represent regions of different size (in contrast to fixed-sized regions used for MDH) ● Non-populated regions are ignored ● Deals more efficiently with a space that is populated sparsely or contains many clusters B C AA1 A2 Root A B A1 A2 C B1 B2 B2 B1
  • 13. WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 13 Index Construction ● Given a set of URIs to index, each of these URIs needs to be looked up and its data needs to be retrieved ● Alternative: crawl the Web to obtain URIs and their data ● Alternative: populate index as a by-product of executing queries using live-exploration-based source selection
  • 14. WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 14 Index Maintenance ● Adding additionally discovered URIs ● Keeping the index in sync with original data ● Still an open research problem ● Similar to index maintenance in information retrieval and view maintenance in database systems
  • 15. WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 15 Outline  Objectives of Source Selection  Index-Based Strategy ➢ General Idea ➢ Possible Index Structures  Live Exploration Strategy  Comparison of both Strategies  Combining both Strategies √ √
  • 16. WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 16 Live Exploration ● General idea: Perform a recursive URI lookup process at query execution runtime ● Start from a set of seed URIs ● Explore the queried Web by traversing data links ● Retrieved data serves two purposes: (1) Discover further URIs (2) Construct query result ● Lookup of URIs may be constrained (i.e., not all links need be traversed) ● Natural support of reachability-based query semantics
  • 17. WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 17 Comparison to Focused Crawling ● Separate pre-runtime (or background) process ● Crawler populates a search index or a local database ● Essential part of the query execution process itself ● Live exploration aims to discover data for answering a particular query ● URIs qualify for lookup because of their high relevance for a topic ● Relevance of URIs related to the query at hand Focused Crawling vs. Live Exploration
  • 18. WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 18 Outline  Objectives of Source Selection  Index-Based Strategy ➢ General Idea ➢ Possible Index Structures  Live Exploration Strategy  Comparison of both Strategies  Combining both Strategies √ √ √
  • 19. WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 19 Live Exploration – vs. – Index-Based ● Possibilities for parallelized data retrieval are limited ● Data retrieval adds to query execution time significantly ● Usable immediately ● Most suitable for “on- demand” querying scenario ● Depends on the structure of the network of data links ● Data retrieval can be fully parallelized ● Reduces the impact of data retrieval on query exec. time ● Usable only after initialization phase ● Depends on what has been selected for the index ● May miss new data sources None of both strategies is superior over the other w.r.t. result completeness (under full-Web query semantics). ● Both strategies may miss (different) solutions for a query
  • 20. WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 20 Hybrid Source Selection Why not get the best of both strategies by combining them? ● Ideas: ● Use index to obtain seed URIs for live exploration (e.g., “mixed strategy” [LT10]) ● Feed back information discovered by live exploration to update, to expand, or to reorganize the index ● Use data summary for controlling a live exploration process (e.g., by prioritizing the URIs scheduled for lookup)
  • 21. WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 21 Outline  Objectives of Source Selection  Index-Based Strategy ➢ General Idea ➢ Possible Index Structures  Live Exploration Strategy  Comparison of both Strategies  Combining both Strategies √ √ √ √ √ Next part: 4. Execution Process ...
  • 22. WWW 2013 Tutorial on Linked Data Query Processing [ Source Selection ] 22 These slides have been created by Olaf Hartig for the WWW 2013 tutorial on Link Data Query Processing Tutorial Website: http://db.uwaterloo.ca/LDQTut2013/ This work is licensed under a Creative Commons Attribution-Share Alike 3.0 License (http://creativecommons.org/licenses/by-sa/3.0/) (Slides 10,11, and 12 are inspired by slides from Andreas Harth [HHK+10] – Thanks!)