SlideShare a Scribd company logo
More Complete Resultset Retrieval from Large
Heterogeneous RDF Sources
Andre Valdestilhas Tommaso Soru Muhammad Saleem
AKSW Group, University of Leipzig, Germany
November 24, 2019
Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 1 / 17
Outline
Motivation
Approach
Experiments
Conclusion & Future work
Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 2 / 17
Motivation
The LOD Cloud
2007
2009
2019
Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 3 / 17
Motivation
Where to find RDF datasets?
9,960
raw RDF datasets658,206
Datasets (HDT files)
LODLaundromat
Which Dataset?
...
559
Endpoint
Different formats1
Query more than 221 billion triples (> 5 Terabytes)
1Serialization.
Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 4 / 17
Example
Where to find RDF datasets?
Authors that have a paper type poster/demo in the proceedings of ISWC
20082
2Query from FEDBench
Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 5 / 17
Example
Where to find RDF datasets?
Authors that have a paper type poster/demo in the proceedings of ISWC
20083
4 HDT datasets4
containing data that can answer the query
3Query from FEDBench
4Semantic Web Dog Food from LOD Laundromat datasets.
Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 6 / 17
Motivation
Approaches
(+) Multiple SPARQL endpoints
(-) 90% are dump files
(+) Dereferenceable URIs
(-) 43% of the URIs are
non-dereferenceable
Endpoint HDT	file file.rdf Dump_file_2
WIMU
Where is my URI?
(+) Data from non-dereferenceable
URIs
(-) No SPARQL query
Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 7 / 17
The approach
A hybrid SPARQL query engine
Collect data from multiple SPARQL endpoints,
Data from RDF dumps including HDT files and use Link Traversal
Link Traversal, obtaining data from non-dereferenceable URIs using WIMUa
aWhere is my URI?(WIMU) http://wimu.aksw.org/
Resulting in
More complete results
Experiments with 3 state-of-the-art SPARQL query benchmarks,
LargeRDFBench, FedBench and FEASIBLE
Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 8 / 17
The approach
Select ?p ?o
Where {<http://uri.com> ?p ?o}
Endpoint
hdt file
dump.bz2
file.rdf
...
http://uri1.com
http://uri2.com
http://uriN.com
Extract URIs
WIMU
1
2
3
Data Dumps
Query processor
Traversal Based
Query processor
Union of
the results
Source Filtering
SPARQL-a-lot
Query processor
SPARQL Endpoint 
Query processor
wimuQ query
execution engine
Results
<subject1><predicate1><object1>
<subject2><predicate2><object2>
<subjectN><predicateN><objectN>
4
5
6
Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 9 / 17
The approach
The source selection
Identify relevant datasets from WIMU
Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 10 / 17
Evaluation
Hypothesis Identify automatically relevant sources from heterogeneous RDF
data, even with non-dereferenceable URIs, can improve the
resultset retrieval
Metrics Coverage and runtime
Approaches FedX (endpoints), SQUIN (Traversal-based), SPARQL-a-lot and
WIMU(dumps)
Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 11 / 17
Evaluation
Experimental setup
Datasets 221.7 billion triples (>5 terabytes)
Queries 415 queries from FedBench, LargeRDFBench and FEASIBLE
Each query executed 5 times
Hardware 200 GB HD, 8GB RAM, 2.70GHZ single core processor
Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 12 / 17
Evaluation
Coverage: Overall 76% queries with results(Zero results=non-public endpoints/data -
non-indexed)
CD LS LD Simple Comp Large Chs Dbpedia SWDF
FedBench | |LargeRDFBench Feasible
100
1000
10000
100000
Averagenumberofresults
onlogscale
EndPoints(FedX) SPARQL-a-lot LinkTraversal(SQUIN) wimuDumps
wimuQ
Approaches and the best coverage
FedBench 55% endpoints
LargeRDFBench 81% wimuDumps
FEASIBLE 98% wimuDumps
Observation
The combination of those query
processing engines implies more resultset
retrieval
Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 13 / 17
Evaluation
Number of datasets
More datasets discovered does not implies in more results
Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 14 / 17
Evaluation
Runtime
Total Average 17 minutes across 3 benchmarks (wimuDumps 2 min, Endpoints
13 min, SPARQL-a-lot 58 sec, LinkTraversal-SQUIN 36 sec)
CD LS LD Simple Comp Large Chs Dbpedia SWDF
FedBench LargeRDFBench Feasible
| |
1
10
Averagerun-time(minutes)
onlogscale
EndPoints(FedX) SPARQL-a-lot LinkTraversal(SQUIN) wimuDumps
wimuQ
Interesting wimuQ takes 91% of results from wimuDumps, only 7% from
SPARQL endpoints. Possible reason, SPARQL endpoint
federation split among multiple endpoints, network and number
of intermediate results influence in the runtime
Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 15 / 17
Conclusion & Future works
Conclusion
A hybrid SPARQL query processing engine to execute SPARQL queries over a
large amount of heterogeneous RDF data
Evaluation on real world datasets using the state of the art of federated and
non-federated query benchmarks (FedBench, LargeRDFBench and
FEASIBLE)
We present the first federated SPARQL query processing engine that executes
SPARQL queries over a total of 221.7 billion triples
Future work
Add more URIs into WIMU index and use Triple Pattern Fragments
A Large Scale approach to study the relation and similarity among the
datasets
Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 16 / 17
That’s all Folks!
Thanks!
Questions?Github repository: https://github.com/firmao/wimuT
Prototype: https://w3id.org/wimuq/
Contact: valdestilhas@informatik.uni-leipzig.de
Special thanks to my PhD. advisor Prof. Dr. rer. nat. Thomas Riechert
Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 17 / 17

More Related Content

What's hot

Friday talk 11.02.2011
Friday talk 11.02.2011Friday talk 11.02.2011
Friday talk 11.02.2011
Jürgen Umbrich
 
Ase2010 shang
Ase2010 shangAse2010 shang
Ase2010 shang
SAIL_QU
 
Mining and Untangling Change Genealogies (PhD Defense Talk)
Mining and Untangling Change Genealogies (PhD Defense Talk)Mining and Untangling Change Genealogies (PhD Defense Talk)
Mining and Untangling Change Genealogies (PhD Defense Talk)
Kim Herzig
 
RDF Stream Processing Tutorial: RSP implementations
RDF Stream Processing Tutorial: RSP implementationsRDF Stream Processing Tutorial: RSP implementations
RDF Stream Processing Tutorial: RSP implementations
Jean-Paul Calbimonte
 
Lightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and ScalaLightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and Scala
Andy Petrella
 
Big data for SAS programmers
Big data for SAS programmersBig data for SAS programmers
Big data for SAS programmers
Kevin Lee
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystem
SlideCentral
 
Opportunities for X-Ray science in future computing architectures
Opportunities for X-Ray science in future computing architecturesOpportunities for X-Ray science in future computing architectures
Opportunities for X-Ray science in future computing architectures
Ian Foster
 
Functional Composition of Sensor Web APIs
Functional Composition of Sensor Web APIsFunctional Composition of Sensor Web APIs
Functional Composition of Sensor Web APIs
Ruben Verborgh
 
The Materials Project - Combining Science and Informatics to Accelerate Mater...
The Materials Project - Combining Science and Informatics to Accelerate Mater...The Materials Project - Combining Science and Informatics to Accelerate Mater...
The Materials Project - Combining Science and Informatics to Accelerate Mater...
University of California, San Diego
 
Connecting Stream Reasoners on the Web
Connecting Stream Reasoners on the WebConnecting Stream Reasoners on the Web
Connecting Stream Reasoners on the Web
Jean-Paul Calbimonte
 
Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light Sources
Ian Foster
 
Dynamic Integrations of Crop Data and Corresponding Meteorological Data based...
Dynamic Integrations of Crop Data and Corresponding Meteorological Data based...Dynamic Integrations of Crop Data and Corresponding Meteorological Data based...
Dynamic Integrations of Crop Data and Corresponding Meteorological Data based...
AIMS (Agricultural Information Management Standards)
 
Dynamic integrations of crop data and corresponding meteorological data based...
Dynamic integrations of crop data and corresponding meteorological data based...Dynamic integrations of crop data and corresponding meteorological data based...
Dynamic integrations of crop data and corresponding meteorological data based...
AIMS (Agricultural Information Management Standards)
 
Accelerating Science with Cloud Technologies in the ABoVE Science Cloud
Accelerating Science with Cloud Technologies in the ABoVE Science CloudAccelerating Science with Cloud Technologies in the ABoVE Science Cloud
Accelerating Science with Cloud Technologies in the ABoVE Science Cloud
Globus
 
Triplewave: a step towards RDF Stream Processing on the Web
Triplewave: a step towards RDF Stream Processing on the WebTriplewave: a step towards RDF Stream Processing on the Web
Triplewave: a step towards RDF Stream Processing on the Web
Daniele Dell'Aglio
 
HDF Town Hall
HDF Town HallHDF Town Hall
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
Ian Foster
 
Stanford/SLAC Cryo-EM Computing and Storage, Yee-Ting Li
Stanford/SLAC Cryo-EM Computing and Storage, Yee-Ting LiStanford/SLAC Cryo-EM Computing and Storage, Yee-Ting Li
Stanford/SLAC Cryo-EM Computing and Storage, Yee-Ting Li
PacificResearchPlatform
 
A Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials ScienceA Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials Science
Globus
 

What's hot (20)

Friday talk 11.02.2011
Friday talk 11.02.2011Friday talk 11.02.2011
Friday talk 11.02.2011
 
Ase2010 shang
Ase2010 shangAse2010 shang
Ase2010 shang
 
Mining and Untangling Change Genealogies (PhD Defense Talk)
Mining and Untangling Change Genealogies (PhD Defense Talk)Mining and Untangling Change Genealogies (PhD Defense Talk)
Mining and Untangling Change Genealogies (PhD Defense Talk)
 
RDF Stream Processing Tutorial: RSP implementations
RDF Stream Processing Tutorial: RSP implementationsRDF Stream Processing Tutorial: RSP implementations
RDF Stream Processing Tutorial: RSP implementations
 
Lightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and ScalaLightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and Scala
 
Big data for SAS programmers
Big data for SAS programmersBig data for SAS programmers
Big data for SAS programmers
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystem
 
Opportunities for X-Ray science in future computing architectures
Opportunities for X-Ray science in future computing architecturesOpportunities for X-Ray science in future computing architectures
Opportunities for X-Ray science in future computing architectures
 
Functional Composition of Sensor Web APIs
Functional Composition of Sensor Web APIsFunctional Composition of Sensor Web APIs
Functional Composition of Sensor Web APIs
 
The Materials Project - Combining Science and Informatics to Accelerate Mater...
The Materials Project - Combining Science and Informatics to Accelerate Mater...The Materials Project - Combining Science and Informatics to Accelerate Mater...
The Materials Project - Combining Science and Informatics to Accelerate Mater...
 
Connecting Stream Reasoners on the Web
Connecting Stream Reasoners on the WebConnecting Stream Reasoners on the Web
Connecting Stream Reasoners on the Web
 
Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light Sources
 
Dynamic Integrations of Crop Data and Corresponding Meteorological Data based...
Dynamic Integrations of Crop Data and Corresponding Meteorological Data based...Dynamic Integrations of Crop Data and Corresponding Meteorological Data based...
Dynamic Integrations of Crop Data and Corresponding Meteorological Data based...
 
Dynamic integrations of crop data and corresponding meteorological data based...
Dynamic integrations of crop data and corresponding meteorological data based...Dynamic integrations of crop data and corresponding meteorological data based...
Dynamic integrations of crop data and corresponding meteorological data based...
 
Accelerating Science with Cloud Technologies in the ABoVE Science Cloud
Accelerating Science with Cloud Technologies in the ABoVE Science CloudAccelerating Science with Cloud Technologies in the ABoVE Science Cloud
Accelerating Science with Cloud Technologies in the ABoVE Science Cloud
 
Triplewave: a step towards RDF Stream Processing on the Web
Triplewave: a step towards RDF Stream Processing on the WebTriplewave: a step towards RDF Stream Processing on the Web
Triplewave: a step towards RDF Stream Processing on the Web
 
HDF Town Hall
HDF Town HallHDF Town Hall
HDF Town Hall
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
Stanford/SLAC Cryo-EM Computing and Storage, Yee-Ting Li
Stanford/SLAC Cryo-EM Computing and Storage, Yee-Ting LiStanford/SLAC Cryo-EM Computing and Storage, Yee-Ting Li
Stanford/SLAC Cryo-EM Computing and Storage, Yee-Ting Li
 
A Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials ScienceA Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials Science
 

Similar to More Complete Resultset Retrieval from Large Heterogeneous RDF Sources

2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod Gmod
Jun Zhao
 
Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011
Juan Sequeda
 
Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013
François Belleau
 
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
National Information Standards Organization (NISO)
 
Eswc2018 wimu slides
Eswc2018 wimu slidesEswc2018 wimu slides
Eswc2018 wimu slides
André Valdestilhas
 
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
Muhammad Saleem
 
Bio2RDF @ W3C HCLS2009
Bio2RDF @ W3C HCLS2009Bio2RDF @ W3C HCLS2009
Bio2RDF @ W3C HCLS2009
François Belleau
 
Sustainable queryable access to Linked Data
Sustainable queryable access to Linked DataSustainable queryable access to Linked Data
Sustainable queryable access to Linked Data
Ruben Verborgh
 
Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...
Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...
Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...
CIARD Movement
 
On a web of data streams
On a web of data streamsOn a web of data streams
On a web of data streams
Daniele Dell'Aglio
 
Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012
François Belleau
 
Finding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic WebFinding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic Web
ebiquity
 
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
Dr.-Ing. Thomas Hartmann
 
RDF Stream Processing: Let's React
RDF Stream Processing: Let's ReactRDF Stream Processing: Let's React
RDF Stream Processing: Let's React
Jean-Paul Calbimonte
 
2010 03 Lodoxf Openflydata
2010 03 Lodoxf Openflydata2010 03 Lodoxf Openflydata
2010 03 Lodoxf Openflydata
Jun Zhao
 
2008 11 13 Hcls Call
2008 11 13 Hcls Call2008 11 13 Hcls Call
2008 11 13 Hcls Call
Jun Zhao
 
rdf query reformulation
rdf query reformulationrdf query reformulation
rdf query reformulation
INRIA-OAK
 
Using Architectures for Semantic Interoperability to Create Journal Clubs for...
Using Architectures for Semantic Interoperability to Create Journal Clubs for...Using Architectures for Semantic Interoperability to Create Journal Clubs for...
Using Architectures for Semantic Interoperability to Create Journal Clubs for...
James Powell
 
Re-using Media on the Web: Media fragment re-mixing and playout
Re-using Media on the Web: Media fragment re-mixing and playoutRe-using Media on the Web: Media fragment re-mixing and playout
Re-using Media on the Web: Media fragment re-mixing and playout
MediaMixerCommunity
 
Bio2RDF@BH2010
Bio2RDF@BH2010Bio2RDF@BH2010
Bio2RDF@BH2010
François Belleau
 

Similar to More Complete Resultset Retrieval from Large Heterogeneous RDF Sources (20)

2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod Gmod
 
Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011
 
Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013
 
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
 
Eswc2018 wimu slides
Eswc2018 wimu slidesEswc2018 wimu slides
Eswc2018 wimu slides
 
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
 
Bio2RDF @ W3C HCLS2009
Bio2RDF @ W3C HCLS2009Bio2RDF @ W3C HCLS2009
Bio2RDF @ W3C HCLS2009
 
Sustainable queryable access to Linked Data
Sustainable queryable access to Linked DataSustainable queryable access to Linked Data
Sustainable queryable access to Linked Data
 
Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...
Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...
Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...
 
On a web of data streams
On a web of data streamsOn a web of data streams
On a web of data streams
 
Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012
 
Finding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic WebFinding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic Web
 
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
 
RDF Stream Processing: Let's React
RDF Stream Processing: Let's ReactRDF Stream Processing: Let's React
RDF Stream Processing: Let's React
 
2010 03 Lodoxf Openflydata
2010 03 Lodoxf Openflydata2010 03 Lodoxf Openflydata
2010 03 Lodoxf Openflydata
 
2008 11 13 Hcls Call
2008 11 13 Hcls Call2008 11 13 Hcls Call
2008 11 13 Hcls Call
 
rdf query reformulation
rdf query reformulationrdf query reformulation
rdf query reformulation
 
Using Architectures for Semantic Interoperability to Create Journal Clubs for...
Using Architectures for Semantic Interoperability to Create Journal Clubs for...Using Architectures for Semantic Interoperability to Create Journal Clubs for...
Using Architectures for Semantic Interoperability to Create Journal Clubs for...
 
Re-using Media on the Web: Media fragment re-mixing and playout
Re-using Media on the Web: Media fragment re-mixing and playoutRe-using Media on the Web: Media fragment re-mixing and playout
Re-using Media on the Web: Media fragment re-mixing and playout
 
Bio2RDF@BH2010
Bio2RDF@BH2010Bio2RDF@BH2010
Bio2RDF@BH2010
 

More from André Valdestilhas

NaturalMSEQueries_presICWI2023.pdf
NaturalMSEQueries_presICWI2023.pdfNaturalMSEQueries_presICWI2023.pdf
NaturalMSEQueries_presICWI2023.pdf
André Valdestilhas
 
Presentation MSE 2022
Presentation MSE 2022Presentation MSE 2022
Presentation MSE 2022
André Valdestilhas
 
Cedal slides. Web Inteligence 2017
Cedal slides. Web Inteligence 2017Cedal slides. Web Inteligence 2017
Cedal slides. Web Inteligence 2017
André Valdestilhas
 
WASOTA (poster)
WASOTA (poster)WASOTA (poster)
WASOTA (poster)
André Valdestilhas
 
Iswc 2016 completeness correctude
Iswc 2016 completeness correctudeIswc 2016 completeness correctude
Iswc 2016 completeness correctude
André Valdestilhas
 
DBpediaSameAs
DBpediaSameAsDBpediaSameAs
DBpediaSameAs
André Valdestilhas
 
DBpediaSameAs
DBpediaSameAsDBpediaSameAs
DBpediaSameAs
André Valdestilhas
 
Emotion-oriented computing: Possible uses and resources
Emotion-oriented computing: Possible uses and resourcesEmotion-oriented computing: Possible uses and resources
Emotion-oriented computing: Possible uses and resources
André Valdestilhas
 
Using semiotic profile
Using semiotic profileUsing semiotic profile
Using semiotic profile
André Valdestilhas
 
Emotion-oriented computing: Possible uses and applications
Emotion-oriented computing: Possible uses and  applicationsEmotion-oriented computing: Possible uses and  applications
Emotion-oriented computing: Possible uses and applications
André Valdestilhas
 
Um estudo sobre localização de serviços sensíveis ao contexto para Televisão ...
Um estudo sobre localização de serviços sensíveis ao contexto para Televisão ...Um estudo sobre localização de serviços sensíveis ao contexto para Televisão ...
Um estudo sobre localização de serviços sensíveis ao contexto para Televisão ...
André Valdestilhas
 

More from André Valdestilhas (11)

NaturalMSEQueries_presICWI2023.pdf
NaturalMSEQueries_presICWI2023.pdfNaturalMSEQueries_presICWI2023.pdf
NaturalMSEQueries_presICWI2023.pdf
 
Presentation MSE 2022
Presentation MSE 2022Presentation MSE 2022
Presentation MSE 2022
 
Cedal slides. Web Inteligence 2017
Cedal slides. Web Inteligence 2017Cedal slides. Web Inteligence 2017
Cedal slides. Web Inteligence 2017
 
WASOTA (poster)
WASOTA (poster)WASOTA (poster)
WASOTA (poster)
 
Iswc 2016 completeness correctude
Iswc 2016 completeness correctudeIswc 2016 completeness correctude
Iswc 2016 completeness correctude
 
DBpediaSameAs
DBpediaSameAsDBpediaSameAs
DBpediaSameAs
 
DBpediaSameAs
DBpediaSameAsDBpediaSameAs
DBpediaSameAs
 
Emotion-oriented computing: Possible uses and resources
Emotion-oriented computing: Possible uses and resourcesEmotion-oriented computing: Possible uses and resources
Emotion-oriented computing: Possible uses and resources
 
Using semiotic profile
Using semiotic profileUsing semiotic profile
Using semiotic profile
 
Emotion-oriented computing: Possible uses and applications
Emotion-oriented computing: Possible uses and  applicationsEmotion-oriented computing: Possible uses and  applications
Emotion-oriented computing: Possible uses and applications
 
Um estudo sobre localização de serviços sensíveis ao contexto para Televisão ...
Um estudo sobre localização de serviços sensíveis ao contexto para Televisão ...Um estudo sobre localização de serviços sensíveis ao contexto para Televisão ...
Um estudo sobre localização de serviços sensíveis ao contexto para Televisão ...
 

Recently uploaded

Suzanne Lagerweij - Influence Without Power - Why Empathy is Your Best Friend...
Suzanne Lagerweij - Influence Without Power - Why Empathy is Your Best Friend...Suzanne Lagerweij - Influence Without Power - Why Empathy is Your Best Friend...
Suzanne Lagerweij - Influence Without Power - Why Empathy is Your Best Friend...
Suzanne Lagerweij
 
The Intersection between Competition and Data Privacy – OECD – June 2024 OECD...
The Intersection between Competition and Data Privacy – OECD – June 2024 OECD...The Intersection between Competition and Data Privacy – OECD – June 2024 OECD...
The Intersection between Competition and Data Privacy – OECD – June 2024 OECD...
OECD Directorate for Financial and Enterprise Affairs
 
怎么办理(lincoln学位证书)英国林肯大学毕业证文凭学位证书原版一模一样
怎么办理(lincoln学位证书)英国林肯大学毕业证文凭学位证书原版一模一样怎么办理(lincoln学位证书)英国林肯大学毕业证文凭学位证书原版一模一样
怎么办理(lincoln学位证书)英国林肯大学毕业证文凭学位证书原版一模一样
kekzed
 
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
OECD Directorate for Financial and Enterprise Affairs
 
Carrer goals.pptx and their importance in real life
Carrer goals.pptx  and their importance in real lifeCarrer goals.pptx  and their importance in real life
Carrer goals.pptx and their importance in real life
artemacademy2
 
Competition and Regulation in Professions and Occupations – OECD – June 2024 ...
Competition and Regulation in Professions and Occupations – OECD – June 2024 ...Competition and Regulation in Professions and Occupations – OECD – June 2024 ...
Competition and Regulation in Professions and Occupations – OECD – June 2024 ...
OECD Directorate for Financial and Enterprise Affairs
 
ASONAM2023_presection_slide_track-recommendation.pdf
ASONAM2023_presection_slide_track-recommendation.pdfASONAM2023_presection_slide_track-recommendation.pdf
ASONAM2023_presection_slide_track-recommendation.pdf
ToshihiroIto4
 
原版制作贝德福特大学毕业证(bedfordhire毕业证)硕士文凭原版一模一样
原版制作贝德福特大学毕业证(bedfordhire毕业证)硕士文凭原版一模一样原版制作贝德福特大学毕业证(bedfordhire毕业证)硕士文凭原版一模一样
原版制作贝德福特大学毕业证(bedfordhire毕业证)硕士文凭原版一模一样
gpww3sf4
 
Why Psychological Safety Matters for Software Teams - ACE 2024 - Ben Linders.pdf
Why Psychological Safety Matters for Software Teams - ACE 2024 - Ben Linders.pdfWhy Psychological Safety Matters for Software Teams - ACE 2024 - Ben Linders.pdf
Why Psychological Safety Matters for Software Teams - ACE 2024 - Ben Linders.pdf
Ben Linders
 
Pro-competitive Industrial Policy – OECD – June 2024 OECD discussion
Pro-competitive Industrial Policy – OECD – June 2024 OECD discussionPro-competitive Industrial Policy – OECD – June 2024 OECD discussion
Pro-competitive Industrial Policy – OECD – June 2024 OECD discussion
OECD Directorate for Financial and Enterprise Affairs
 
Artificial Intelligence, Data and Competition – OECD – June 2024 OECD discussion
Artificial Intelligence, Data and Competition – OECD – June 2024 OECD discussionArtificial Intelligence, Data and Competition – OECD – June 2024 OECD discussion
Artificial Intelligence, Data and Competition – OECD – June 2024 OECD discussion
OECD Directorate for Financial and Enterprise Affairs
 
Pro-competitive Industrial Policy – LANE – June 2024 OECD discussion
Pro-competitive Industrial Policy – LANE – June 2024 OECD discussionPro-competitive Industrial Policy – LANE – June 2024 OECD discussion
Pro-competitive Industrial Policy – LANE – June 2024 OECD discussion
OECD Directorate for Financial and Enterprise Affairs
 
IEEE CIS Webinar Sustainable futures.pdf
IEEE CIS Webinar Sustainable futures.pdfIEEE CIS Webinar Sustainable futures.pdf
IEEE CIS Webinar Sustainable futures.pdf
Claudio Gallicchio
 
BRIC_2024_2024-06-06-11:30-haunschild_archival_version.pdf
BRIC_2024_2024-06-06-11:30-haunschild_archival_version.pdfBRIC_2024_2024-06-06-11:30-haunschild_archival_version.pdf
BRIC_2024_2024-06-06-11:30-haunschild_archival_version.pdf
Robin Haunschild
 
Artificial Intelligence, Data and Competition – LIM – June 2024 OECD discussion
Artificial Intelligence, Data and Competition – LIM – June 2024 OECD discussionArtificial Intelligence, Data and Competition – LIM – June 2024 OECD discussion
Artificial Intelligence, Data and Competition – LIM – June 2024 OECD discussion
OECD Directorate for Financial and Enterprise Affairs
 
The Intersection between Competition and Data Privacy – KEMP – June 2024 OECD...
The Intersection between Competition and Data Privacy – KEMP – June 2024 OECD...The Intersection between Competition and Data Privacy – KEMP – June 2024 OECD...
The Intersection between Competition and Data Privacy – KEMP – June 2024 OECD...
OECD Directorate for Financial and Enterprise Affairs
 
The Intersection between Competition and Data Privacy – CAPEL – June 2024 OEC...
The Intersection between Competition and Data Privacy – CAPEL – June 2024 OEC...The Intersection between Competition and Data Privacy – CAPEL – June 2024 OEC...
The Intersection between Competition and Data Privacy – CAPEL – June 2024 OEC...
OECD Directorate for Financial and Enterprise Affairs
 
XP 2024 presentation: A New Look to Leadership
XP 2024 presentation: A New Look to LeadershipXP 2024 presentation: A New Look to Leadership
XP 2024 presentation: A New Look to Leadership
samililja
 
Disaster Management project for holidays homework and other uses
Disaster Management project for holidays homework and other usesDisaster Management project for holidays homework and other uses
Disaster Management project for holidays homework and other uses
RIDHIMAGARG21
 
Competition and Regulation in Professions and Occupations – ROBSON – June 202...
Competition and Regulation in Professions and Occupations – ROBSON – June 202...Competition and Regulation in Professions and Occupations – ROBSON – June 202...
Competition and Regulation in Professions and Occupations – ROBSON – June 202...
OECD Directorate for Financial and Enterprise Affairs
 

Recently uploaded (20)

Suzanne Lagerweij - Influence Without Power - Why Empathy is Your Best Friend...
Suzanne Lagerweij - Influence Without Power - Why Empathy is Your Best Friend...Suzanne Lagerweij - Influence Without Power - Why Empathy is Your Best Friend...
Suzanne Lagerweij - Influence Without Power - Why Empathy is Your Best Friend...
 
The Intersection between Competition and Data Privacy – OECD – June 2024 OECD...
The Intersection between Competition and Data Privacy – OECD – June 2024 OECD...The Intersection between Competition and Data Privacy – OECD – June 2024 OECD...
The Intersection between Competition and Data Privacy – OECD – June 2024 OECD...
 
怎么办理(lincoln学位证书)英国林肯大学毕业证文凭学位证书原版一模一样
怎么办理(lincoln学位证书)英国林肯大学毕业证文凭学位证书原版一模一样怎么办理(lincoln学位证书)英国林肯大学毕业证文凭学位证书原版一模一样
怎么办理(lincoln学位证书)英国林肯大学毕业证文凭学位证书原版一模一样
 
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
 
Carrer goals.pptx and their importance in real life
Carrer goals.pptx  and their importance in real lifeCarrer goals.pptx  and their importance in real life
Carrer goals.pptx and their importance in real life
 
Competition and Regulation in Professions and Occupations – OECD – June 2024 ...
Competition and Regulation in Professions and Occupations – OECD – June 2024 ...Competition and Regulation in Professions and Occupations – OECD – June 2024 ...
Competition and Regulation in Professions and Occupations – OECD – June 2024 ...
 
ASONAM2023_presection_slide_track-recommendation.pdf
ASONAM2023_presection_slide_track-recommendation.pdfASONAM2023_presection_slide_track-recommendation.pdf
ASONAM2023_presection_slide_track-recommendation.pdf
 
原版制作贝德福特大学毕业证(bedfordhire毕业证)硕士文凭原版一模一样
原版制作贝德福特大学毕业证(bedfordhire毕业证)硕士文凭原版一模一样原版制作贝德福特大学毕业证(bedfordhire毕业证)硕士文凭原版一模一样
原版制作贝德福特大学毕业证(bedfordhire毕业证)硕士文凭原版一模一样
 
Why Psychological Safety Matters for Software Teams - ACE 2024 - Ben Linders.pdf
Why Psychological Safety Matters for Software Teams - ACE 2024 - Ben Linders.pdfWhy Psychological Safety Matters for Software Teams - ACE 2024 - Ben Linders.pdf
Why Psychological Safety Matters for Software Teams - ACE 2024 - Ben Linders.pdf
 
Pro-competitive Industrial Policy – OECD – June 2024 OECD discussion
Pro-competitive Industrial Policy – OECD – June 2024 OECD discussionPro-competitive Industrial Policy – OECD – June 2024 OECD discussion
Pro-competitive Industrial Policy – OECD – June 2024 OECD discussion
 
Artificial Intelligence, Data and Competition – OECD – June 2024 OECD discussion
Artificial Intelligence, Data and Competition – OECD – June 2024 OECD discussionArtificial Intelligence, Data and Competition – OECD – June 2024 OECD discussion
Artificial Intelligence, Data and Competition – OECD – June 2024 OECD discussion
 
Pro-competitive Industrial Policy – LANE – June 2024 OECD discussion
Pro-competitive Industrial Policy – LANE – June 2024 OECD discussionPro-competitive Industrial Policy – LANE – June 2024 OECD discussion
Pro-competitive Industrial Policy – LANE – June 2024 OECD discussion
 
IEEE CIS Webinar Sustainable futures.pdf
IEEE CIS Webinar Sustainable futures.pdfIEEE CIS Webinar Sustainable futures.pdf
IEEE CIS Webinar Sustainable futures.pdf
 
BRIC_2024_2024-06-06-11:30-haunschild_archival_version.pdf
BRIC_2024_2024-06-06-11:30-haunschild_archival_version.pdfBRIC_2024_2024-06-06-11:30-haunschild_archival_version.pdf
BRIC_2024_2024-06-06-11:30-haunschild_archival_version.pdf
 
Artificial Intelligence, Data and Competition – LIM – June 2024 OECD discussion
Artificial Intelligence, Data and Competition – LIM – June 2024 OECD discussionArtificial Intelligence, Data and Competition – LIM – June 2024 OECD discussion
Artificial Intelligence, Data and Competition – LIM – June 2024 OECD discussion
 
The Intersection between Competition and Data Privacy – KEMP – June 2024 OECD...
The Intersection between Competition and Data Privacy – KEMP – June 2024 OECD...The Intersection between Competition and Data Privacy – KEMP – June 2024 OECD...
The Intersection between Competition and Data Privacy – KEMP – June 2024 OECD...
 
The Intersection between Competition and Data Privacy – CAPEL – June 2024 OEC...
The Intersection between Competition and Data Privacy – CAPEL – June 2024 OEC...The Intersection between Competition and Data Privacy – CAPEL – June 2024 OEC...
The Intersection between Competition and Data Privacy – CAPEL – June 2024 OEC...
 
XP 2024 presentation: A New Look to Leadership
XP 2024 presentation: A New Look to LeadershipXP 2024 presentation: A New Look to Leadership
XP 2024 presentation: A New Look to Leadership
 
Disaster Management project for holidays homework and other uses
Disaster Management project for holidays homework and other usesDisaster Management project for holidays homework and other uses
Disaster Management project for holidays homework and other uses
 
Competition and Regulation in Professions and Occupations – ROBSON – June 202...
Competition and Regulation in Professions and Occupations – ROBSON – June 202...Competition and Regulation in Professions and Occupations – ROBSON – June 202...
Competition and Regulation in Professions and Occupations – ROBSON – June 202...
 

More Complete Resultset Retrieval from Large Heterogeneous RDF Sources

  • 1. More Complete Resultset Retrieval from Large Heterogeneous RDF Sources Andre Valdestilhas Tommaso Soru Muhammad Saleem AKSW Group, University of Leipzig, Germany November 24, 2019 Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 1 / 17
  • 2. Outline Motivation Approach Experiments Conclusion & Future work Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 2 / 17
  • 3. Motivation The LOD Cloud 2007 2009 2019 Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 3 / 17
  • 4. Motivation Where to find RDF datasets? 9,960 raw RDF datasets658,206 Datasets (HDT files) LODLaundromat Which Dataset? ... 559 Endpoint Different formats1 Query more than 221 billion triples (> 5 Terabytes) 1Serialization. Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 4 / 17
  • 5. Example Where to find RDF datasets? Authors that have a paper type poster/demo in the proceedings of ISWC 20082 2Query from FEDBench Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 5 / 17
  • 6. Example Where to find RDF datasets? Authors that have a paper type poster/demo in the proceedings of ISWC 20083 4 HDT datasets4 containing data that can answer the query 3Query from FEDBench 4Semantic Web Dog Food from LOD Laundromat datasets. Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 6 / 17
  • 7. Motivation Approaches (+) Multiple SPARQL endpoints (-) 90% are dump files (+) Dereferenceable URIs (-) 43% of the URIs are non-dereferenceable Endpoint HDT file file.rdf Dump_file_2 WIMU Where is my URI? (+) Data from non-dereferenceable URIs (-) No SPARQL query Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 7 / 17
  • 8. The approach A hybrid SPARQL query engine Collect data from multiple SPARQL endpoints, Data from RDF dumps including HDT files and use Link Traversal Link Traversal, obtaining data from non-dereferenceable URIs using WIMUa aWhere is my URI?(WIMU) http://wimu.aksw.org/ Resulting in More complete results Experiments with 3 state-of-the-art SPARQL query benchmarks, LargeRDFBench, FedBench and FEASIBLE Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 8 / 17
  • 9. The approach Select ?p ?o Where {<http://uri.com> ?p ?o} Endpoint hdt file dump.bz2 file.rdf ... http://uri1.com http://uri2.com http://uriN.com Extract URIs WIMU 1 2 3 Data Dumps Query processor Traversal Based Query processor Union of the results Source Filtering SPARQL-a-lot Query processor SPARQL Endpoint  Query processor wimuQ query execution engine Results <subject1><predicate1><object1> <subject2><predicate2><object2> <subjectN><predicateN><objectN> 4 5 6 Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 9 / 17
  • 10. The approach The source selection Identify relevant datasets from WIMU Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 10 / 17
  • 11. Evaluation Hypothesis Identify automatically relevant sources from heterogeneous RDF data, even with non-dereferenceable URIs, can improve the resultset retrieval Metrics Coverage and runtime Approaches FedX (endpoints), SQUIN (Traversal-based), SPARQL-a-lot and WIMU(dumps) Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 11 / 17
  • 12. Evaluation Experimental setup Datasets 221.7 billion triples (>5 terabytes) Queries 415 queries from FedBench, LargeRDFBench and FEASIBLE Each query executed 5 times Hardware 200 GB HD, 8GB RAM, 2.70GHZ single core processor Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 12 / 17
  • 13. Evaluation Coverage: Overall 76% queries with results(Zero results=non-public endpoints/data - non-indexed) CD LS LD Simple Comp Large Chs Dbpedia SWDF FedBench | |LargeRDFBench Feasible 100 1000 10000 100000 Averagenumberofresults onlogscale EndPoints(FedX) SPARQL-a-lot LinkTraversal(SQUIN) wimuDumps wimuQ Approaches and the best coverage FedBench 55% endpoints LargeRDFBench 81% wimuDumps FEASIBLE 98% wimuDumps Observation The combination of those query processing engines implies more resultset retrieval Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 13 / 17
  • 14. Evaluation Number of datasets More datasets discovered does not implies in more results Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 14 / 17
  • 15. Evaluation Runtime Total Average 17 minutes across 3 benchmarks (wimuDumps 2 min, Endpoints 13 min, SPARQL-a-lot 58 sec, LinkTraversal-SQUIN 36 sec) CD LS LD Simple Comp Large Chs Dbpedia SWDF FedBench LargeRDFBench Feasible | | 1 10 Averagerun-time(minutes) onlogscale EndPoints(FedX) SPARQL-a-lot LinkTraversal(SQUIN) wimuDumps wimuQ Interesting wimuQ takes 91% of results from wimuDumps, only 7% from SPARQL endpoints. Possible reason, SPARQL endpoint federation split among multiple endpoints, network and number of intermediate results influence in the runtime Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 15 / 17
  • 16. Conclusion & Future works Conclusion A hybrid SPARQL query processing engine to execute SPARQL queries over a large amount of heterogeneous RDF data Evaluation on real world datasets using the state of the art of federated and non-federated query benchmarks (FedBench, LargeRDFBench and FEASIBLE) We present the first federated SPARQL query processing engine that executes SPARQL queries over a total of 221.7 billion triples Future work Add more URIs into WIMU index and use Triple Pattern Fragments A Large Scale approach to study the relation and similarity among the datasets Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 16 / 17
  • 17. That’s all Folks! Thanks! Questions?Github repository: https://github.com/firmao/wimuT Prototype: https://w3id.org/wimuq/ Contact: valdestilhas@informatik.uni-leipzig.de Special thanks to my PhD. advisor Prof. Dr. rer. nat. Thomas Riechert Valdestilhas et al. (AKSW) More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesNovember 24, 2019 17 / 17