SlideShare a Scribd company logo
1 of 22
Download to read offline
Where is my URI?
Andre Valdestilhas1
Tommaso Soru1
Markus Nentwig2
Edgard Marx3
Muhammad Saleem1
Axel-Cyrille Ngonga Ngomo1,4
1
AKSW Group, University of Leipzig, Germany
2
Database Group, University of Leipzig, Germany
3
Leipzig University of Applied Sciences, Germany
4
Data Science Group, Paderborn University, Germany
June 6, 2018
Valdestilhas et al. (AKSW) Where is my URI? June 6, 2018 1 / 21
Outline
Motivation
Approach
Experiments and Results
Conclusions and Future work
Valdestilhas et al. (AKSW) Where is my URI? June 6, 2018 2 / 21
Motivation
endpoint
hdt file dump.bz2
file.rdf
http://uri.com
...
Valdestilhas et al. (AKSW) Where is my URI? June 6, 2018 3 / 21
Motivation
Example:
http://dbpedia.org/resource/Leipzig
DBpedia
http://citeseer.rkbexplorer.com/id/resource-CS116606
km.aifb.kit.edu
http://quotationsbook.com/author/59
Obvious
Not so
obvious
No
dataset
Valdestilhas et al. (AKSW) Where is my URI? June 6, 2018 4 / 21
Motivation
Use cases
Why do we need to know the URI Dataset?
Data quality in Link Repositories
Regenerate mappings using the CBDs to reapply link discovery algorithms in
order to validate the mappings.
Part of LinkLion 2.0
Valdestilhas et al. (AKSW) Where is my URI? June 6, 2018 5 / 21
Motivation
Dataset
S
Link Discovery
Dataset
T
Mappings
<URISource1><owl:sameAs><URITarget1>
<URISource2><owl:sameAs><URITarget2>
<URISourceN><owl:sameAs><URITargetN>
Mappings
<URISource1><owl:sameAs><URITarget1>
<URISource2><owl:sameAs><URITarget2>
<URISourceN><owl:sameAs><URITargetN>
Dataset
S
Dataset
T
?
Valdestilhas et al. (AKSW) Where is my URI? June 6, 2018 6 / 21
Motivation
Use cases
Federated Query Processing
Query planning and Source (dataset) selection
WIMU will find relevant sources against the individual triple patterns of a
given SPARQL query
SELECT ?v1 ?v2 WHERE {
?uri <p1> ?v1. // Triple Pattern 1 (TP1)
?uri <p2> ?v2. // Triple Pattern 2 (TP2)
} 
Source selection
RDF
S1
RDF
S2
RDF
S3
RDF
S4
WIMU source selection
TP1 = {S1,15},  {S2,9},  {S3,7}
TP2 = {S1,20},  {S2,11},  {S4,10}
Total selected sources = 6
Valdestilhas et al. (AKSW) Where is my URI? June 6, 2018 7 / 21
Approach
Goal
Index URIs and their use to enable Linked Data consumers to find relevant RDF
data sources
Rank the datasets proportionally to the number of literals
Keep the provenance of the URI
Valdestilhas et al. (AKSW) Where is my URI? June 6, 2018 8 / 21
Approach
Goal
Index URIs and their use to enable Linked Data consumers to find relevant RDF
data sources
Rank the datasets proportionally to the number of literals
Keep the provenance of the URI
Valdestilhas et al. (AKSW) Where is my URI? June 6, 2018 8 / 21
Approach
Steps to create the index
WIMU Service
URI Dataset Score
http://... dataset1 3
http://... dataset2 1
http://... datasetN n
Dumps & Endpoints
LOD Laundromat
HDT files
URI & Dataset & score
Index
Retrieve list of
datasets from
sources.
Retrieve data from
datasets.
SELECT ?s (count (?o) as ?c)
WHERE {
?s [] ?o . FILTER(isliteral (?o))
} GROUP BY ?s
1
2
3
Make the index
available and browsable
via a web application
and an API service.
4
5
Build three indexes
from the datasets,
building the rank
Merge the indexes
into one.
Valdestilhas et al. (AKSW) Where is my URI? June 6, 2018 9 / 21
Approach
The interface (Web and Json)
curl "http://wimu.aksw.org/Find?top=5&uri=http://dbpedia.org/resource/Leipzig"
[{"dataset":"http://downloads.dbpedia.org/2016-10/core-i18n/en/infobox_properties_en.ttl.bz2","CountLiteral":"236"},
{"dataset":"http://gaia.infor.uva.es/hdt/dbpedia2015.hdt.gz","CountLiteral":"165"},
{"dataset":"http://download.lodlaundromat.org/a9dabf348fd6262edfbbcf7256b0f839?type=hdt","CountLiteral":"142"},
{"dataset":"http://download.lodlaundromat.org/dde1dcc095b38a1b65ebfbc7696d7998?type=hdt","CountLiteral":"124"}]
Valdestilhas et al. (AKSW) Where is my URI? June 6, 2018 10 / 21
Approach
Usage
Service: https://wimu.aksw.org/Find
Parameter Default Description
top 0 Top occurrences of the datasets
uri – URI expected to search
link – URL from a linkset
cbd – The URI that will be the origin the CBD
ds – URL to download the dataset
Valdestilhas et al. (AKSW) Where is my URI? June 6, 2018 11 / 21
Approach
Examples
Single URI, top 5 datasets
https://wimu.aksw.org/Find?top=5&uri=http:
//dbpedia.org/resource/Leipzig
Linkset
https://wimu.aksw.org/Find?link=http://www.linklion.org/download/
mapping/sws.geonames.org---purl.org.nt
CBD
http://wimu.aksw.org/Find?cbd=http%3A%2F%2Fciteseer.rkbexplorer.
com%2Fid%2Fresource-CS116606&ds=http://download.lodlaundromat.
org/b7081efa178bc4ab3ff3a6ef5abac9b2?type=hdt
Valdestilhas et al. (AKSW) Where is my URI? June 6, 2018 12 / 21
Approach
Relevant points
Rank the datasets from LODStats and LODLaundromat using a score
function
is able to process linksets (more than one URI per request)
Valdestilhas et al. (AKSW) Where is my URI? June 6, 2018 13 / 21
Experiments and results
Experimental Setup
Hardware: 64 CPU cores, 126 GB RAM, 2 TB hard disk.
Creation of the index: 3 days and 7 hours.
Valdestilhas et al. (AKSW) Where is my URI? June 6, 2018 14 / 21
Experiments and results
The heuristic behind the Literals
A sample of 100 DBpedia URIs and occurrences of the top-1 datasets
according to the number of Literals
DBpedia:LODLaundromat::0d75803ad1faf47355a2656821d598bb?type=hdt
DBpedia:LODLaundromat::1fdf94a6a8ec729410fe99da03f721d9?type=hdt
DBpedia:LODLaundromat::4579a1f4a4637037f1995717b96f7bb5?type=hdt
DBpedia:LODLaundromat::7797f921760e5d4a02ee6eca37c951ec?type=hdt
DBpedia:LODLaundromat::e8f74020ae3cf760c34af2b20c4451c4?type=hdt
Dbpedia:2016:infobox_properties_en.ttl.bz2
DBpedia:RDFHDT:dbpedia2015.hdt.gz
0 10 20 30 40 50 60 70
ype=hdt
ype=hdt
ype=hdt
ype=hdt
ype=hdt
en.ttl.bz2
15.hdt.gz
Datasets
Manually checked: 100% precision
Valdestilhas et al. (AKSW) Where is my URI? June 6, 2018 15 / 21
Experiments and results
The heuristic behind the Literals
Top 100 datasets from http://dbpedia.org/resource/Leipzig
0
50
100
150
200
250
Literals
Valdestilhas et al. (AKSW) Where is my URI? June 6, 2018 16 / 21
Experiments and results
The heuristic behind the Literals
Rank top 5 datasets
D1 D2 D3 D4 D5
0
10
20
30
40
50
60
70
80
Literals
http://dbpedia.org/resource/Sandersleben
D6 D7 D8 D9 D10
0
20
40
60
80
100
120
140
160
180
200
220
240
Literals
http://dbpedia.org/resource/Leipzig
D1 http://km.aifb.kit.edu/projects/btc-2009/btc-2009-chunk-101.gz
D2 http://km.aifb.kit.edu/projects/btc-2012/dbpedia/data-0.nq.gz
D3 http://gaia.infor.uva.es/hdt/dbpedia2015.hdt.gz
D4 http://data.dws.informatik.uni-mannheim.de/dbpedia/3.8/ru/infobox_properties_en_uris_ru.nt.bz2
D5 http://data.dws.informatik.uni-mannheim.de/dbpedia/3.9/ru/raw_infobox_properties_en_uris_ru.nt.bz2
D6 http://downloads.dbpedia.org/2016-10/core-i18n/en/infobox_properties_en.ttl.bz2
D7 http://gaia.infor.uva.es/hdt/dbpedia2015.hdt.gz
D8 http://km.aifb.kit.edu/projects/btc-2009/btc-2009-chunk-061.gz
D9 http://data.dws.informatik.uni-mannheim.de/dbpedia/2014/en/infobox_properties_unredirected_en.nq.bz2
D10 http://data.dws.informatik.uni-mannheim.de/dbpedia/2014/en/infobox_properties_en.nq.bz2
Valdestilhas et al. (AKSW) Where is my URI? June 6, 2018 17 / 21
Experiments and results
Datasets
LOD Laundromat LODStats Total
URIs indexed 4,185,133,445 31,121,342 4,216,254,787
Datasets checked 658,206 9,960 668,166
Triples processed 19,891,702,202 38,606,408,854 58,498,111,056
LODStats
60% offline, 14% empty
8% triples with literals as objects are blank nodes
35% online datasets present some errors using Jena
69.8% datasets with parser errors
LODLaundromat
2.3% parsing errors. 99% indexed by WIMU
Valdestilhas et al. (AKSW) Where is my URI? June 6, 2018 18 / 21
Experiments and results
Datasets
Content negotiation error from Content-type:
text/html.
Error making the query, see cause for details
null
Service URL overrides the 'query' SPARQL
protocol paramenter
XMLStreamException: ParseError ar [row,col]:
[-8251091,8]
Failed when initializing the StAX parsing engine
JENA parsing errors
Not all datasets are ready to use
Valdestilhas et al. (AKSW) Where is my URI? June 6, 2018 19 / 21
Conclusions and Future work
A regularly updated database index of more than 660K datasets from
LODStats and LODLaundromat
An efficient service on the web that can determine which dataset will most
likely define a URI
Various statistics of datasets indexed from LODStats and LODLaundromat
Future work: Integrate the second version of LinkLion
http://www.linklion.org
Valdestilhas et al. (AKSW) Where is my URI? June 6, 2018 20 / 21
That’s all Folks!
Thanks!
Questions?Website: http://wimu.aksw.org/
Contact: valdestilhas@informatik.uni-leipzig.de
This work was supported by grants from the EU H2020 Framework Programme
provided for the project HOBBIT (GA no. 688227).
Valdestilhas et al. (AKSW) Where is my URI? June 6, 2018 21 / 21

More Related Content

What's hot

Open PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future ChallengesOpen PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future ChallengesSciBite Limited
 
Using Linked Data to Mine RDF from Wikipedia's Tables
Using Linked Data to Mine RDF from Wikipedia's TablesUsing Linked Data to Mine RDF from Wikipedia's Tables
Using Linked Data to Mine RDF from Wikipedia's Tables祺傑 林
 
Floriane Muller, Pablo Iriarte, University of Geneva Library, Switzerland Mea...
Floriane Muller, Pablo Iriarte, University of Geneva Library, Switzerland Mea...Floriane Muller, Pablo Iriarte, University of Geneva Library, Switzerland Mea...
Floriane Muller, Pablo Iriarte, University of Geneva Library, Switzerland Mea...CTLes
 
Building a Network of Interoperable and Independently Produced Linked and Ope...
Building a Network of Interoperable and Independently Produced Linked and Ope...Building a Network of Interoperable and Independently Produced Linked and Ope...
Building a Network of Interoperable and Independently Produced Linked and Ope...Michel Dumontier
 
ORCID at Crossref LIVE Indonesia
ORCID at Crossref LIVE IndonesiaORCID at Crossref LIVE Indonesia
ORCID at Crossref LIVE IndonesiaCrossref
 
Implementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTSImplementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTSValery Tkachenko
 
Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data
Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series DataSharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data
Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series DataStuart Chalk
 
Data Journalism - Cleaning Data
Data Journalism - Cleaning DataData Journalism - Cleaning Data
Data Journalism - Cleaning DataBahareh Heravi
 
China: Journal Publishing, DOI and CrossCheck (2011 CrossRef Workshops)
China: Journal Publishing, DOI and CrossCheck (2011 CrossRef Workshops)China: Journal Publishing, DOI and CrossCheck (2011 CrossRef Workshops)
China: Journal Publishing, DOI and CrossCheck (2011 CrossRef Workshops)Crossref
 
Drug and Vaccine Discovery: Knowledge Graph + Apache Spark
Drug and Vaccine Discovery: Knowledge Graph + Apache SparkDrug and Vaccine Discovery: Knowledge Graph + Apache Spark
Drug and Vaccine Discovery: Knowledge Graph + Apache SparkDatabricks
 

What's hot (19)

Sharing chemical structures with peer reviewed publications
Sharing chemical structures with peer reviewed publications Sharing chemical structures with peer reviewed publications
Sharing chemical structures with peer reviewed publications
 
Open PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future ChallengesOpen PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future Challenges
 
Mastering the variety dimension - Ontop Demonstration
Mastering the variety dimension - Ontop DemonstrationMastering the variety dimension - Ontop Demonstration
Mastering the variety dimension - Ontop Demonstration
 
Using Linked Data to Mine RDF from Wikipedia's Tables
Using Linked Data to Mine RDF from Wikipedia's TablesUsing Linked Data to Mine RDF from Wikipedia's Tables
Using Linked Data to Mine RDF from Wikipedia's Tables
 
Floriane Muller, Pablo Iriarte, University of Geneva Library, Switzerland Mea...
Floriane Muller, Pablo Iriarte, University of Geneva Library, Switzerland Mea...Floriane Muller, Pablo Iriarte, University of Geneva Library, Switzerland Mea...
Floriane Muller, Pablo Iriarte, University of Geneva Library, Switzerland Mea...
 
Building a Network of Interoperable and Independently Produced Linked and Ope...
Building a Network of Interoperable and Independently Produced Linked and Ope...Building a Network of Interoperable and Independently Produced Linked and Ope...
Building a Network of Interoperable and Independently Produced Linked and Ope...
 
The CIARD RINGValeri
The CIARD RINGValeriThe CIARD RINGValeri
The CIARD RINGValeri
 
Data Science for the Win
Data Science for the WinData Science for the Win
Data Science for the Win
 
ORCID at Crossref LIVE Indonesia
ORCID at Crossref LIVE IndonesiaORCID at Crossref LIVE Indonesia
ORCID at Crossref LIVE Indonesia
 
Implementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTSImplementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTS
 
Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data
Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series DataSharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data
Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data
 
Investigating Impact Metrics for Performance for the US-EPA National Center f...
Investigating Impact Metrics for Performance for the US-EPA National Center f...Investigating Impact Metrics for Performance for the US-EPA National Center f...
Investigating Impact Metrics for Performance for the US-EPA National Center f...
 
Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...
 
Data Journalism - Cleaning Data
Data Journalism - Cleaning DataData Journalism - Cleaning Data
Data Journalism - Cleaning Data
 
An examination of data quality on QSAR Modeling in regards to the environment...
An examination of data quality on QSAR Modeling in regards to the environment...An examination of data quality on QSAR Modeling in regards to the environment...
An examination of data quality on QSAR Modeling in regards to the environment...
 
OKE2018 Challenge @ ESWC2018
OKE2018 Challenge @ ESWC2018OKE2018 Challenge @ ESWC2018
OKE2018 Challenge @ ESWC2018
 
Open innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts projectOpen innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts project
 
China: Journal Publishing, DOI and CrossCheck (2011 CrossRef Workshops)
China: Journal Publishing, DOI and CrossCheck (2011 CrossRef Workshops)China: Journal Publishing, DOI and CrossCheck (2011 CrossRef Workshops)
China: Journal Publishing, DOI and CrossCheck (2011 CrossRef Workshops)
 
Drug and Vaccine Discovery: Knowledge Graph + Apache Spark
Drug and Vaccine Discovery: Knowledge Graph + Apache SparkDrug and Vaccine Discovery: Knowledge Graph + Apache Spark
Drug and Vaccine Discovery: Knowledge Graph + Apache Spark
 

Similar to Eswc2018 wimu slides

Dataset Catalogs as a Foundation for FAIR* Data
Dataset Catalogs as a Foundation for FAIR* DataDataset Catalogs as a Foundation for FAIR* Data
Dataset Catalogs as a Foundation for FAIR* DataTom Plasterer
 
More Complete Resultset Retrieval from Large Heterogeneous RDF Sources
More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesMore Complete Resultset Retrieval from Large Heterogeneous RDF Sources
More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesAndré Valdestilhas
 
ACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectStuart Chalk
 
Resilient Linked Data
Resilient Linked DataResilient Linked Data
Resilient Linked DataDave Reynolds
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
 
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical DataA Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical DataStuart Chalk
 
Supporting Dataset Descriptions in the Life Sciences
Supporting Dataset Descriptions in the Life SciencesSupporting Dataset Descriptions in the Life Sciences
Supporting Dataset Descriptions in the Life SciencesAlasdair Gray
 
Semantics and linked data at astra zeneca
Semantics and linked data at astra zenecaSemantics and linked data at astra zeneca
Semantics and linked data at astra zenecaKerstin Forsberg
 
TFF2016, Rudi Studer, Smarte Dienstleistungen mit semantischen Technologien
TFF2016, Rudi Studer, Smarte Dienstleistungen mit semantischen TechnologienTFF2016, Rudi Studer, Smarte Dienstleistungen mit semantischen Technologien
TFF2016, Rudi Studer, Smarte Dienstleistungen mit semantischen TechnologienTourismFastForward
 
Challenges in Analytics for BIG Data
Challenges in Analytics for BIG DataChallenges in Analytics for BIG Data
Challenges in Analytics for BIG DataPrasant Misra
 
The Rocky Road to Reuse
The Rocky Road to ReuseThe Rocky Road to Reuse
The Rocky Road to ReuseAnita de Waard
 
sers, Applications and the Community of Practice for the Air Quality Scenario
sers, Applications and the Community of Practice for the Air Quality Scenariosers, Applications and the Community of Practice for the Air Quality Scenario
sers, Applications and the Community of Practice for the Air Quality ScenarioRudolf Husar
 
2008-05-05 GEOSS UIC-ADC AQ Scen W shop Toronto
2008-05-05 GEOSS UIC-ADC AQ Scen W shop Toronto2008-05-05 GEOSS UIC-ADC AQ Scen W shop Toronto
2008-05-05 GEOSS UIC-ADC AQ Scen W shop TorontoRudolf Husar
 
dkNET Annual Meeting - June 2017
dkNET Annual Meeting - June 2017dkNET Annual Meeting - June 2017
dkNET Annual Meeting - June 2017dkNET
 
What infrastructure is necessary for successful research data management (RDM...
What infrastructure is necessary for successful research data management (RDM...What infrastructure is necessary for successful research data management (RDM...
What infrastructure is necessary for successful research data management (RDM...heila1
 
Linked dataresearch
Linked dataresearchLinked dataresearch
Linked dataresearchTope Omitola
 
Interlinking Standardized OpenStreetMap Data and Citizen Science Data in the ...
Interlinking Standardized OpenStreetMap Data and Citizen Science Data in the ...Interlinking Standardized OpenStreetMap Data and Citizen Science Data in the ...
Interlinking Standardized OpenStreetMap Data and Citizen Science Data in the ...Werner Leyh
 
How Does Data Science Impact the Semantic Web?
How Does Data Science Impact the Semantic Web?How Does Data Science Impact the Semantic Web?
How Does Data Science Impact the Semantic Web?Philip Bourne
 

Similar to Eswc2018 wimu slides (20)

Dataset Catalogs as a Foundation for FAIR* Data
Dataset Catalogs as a Foundation for FAIR* DataDataset Catalogs as a Foundation for FAIR* Data
Dataset Catalogs as a Foundation for FAIR* Data
 
NIH BD2K DataMed model, DATS
NIH BD2K DataMed model, DATSNIH BD2K DataMed model, DATS
NIH BD2K DataMed model, DATS
 
More Complete Resultset Retrieval from Large Heterogeneous RDF Sources
More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesMore Complete Resultset Retrieval from Large Heterogeneous RDF Sources
More Complete Resultset Retrieval from Large Heterogeneous RDF Sources
 
ACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP Project
 
Resilient Linked Data
Resilient Linked DataResilient Linked Data
Resilient Linked Data
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical DataA Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
 
Supporting Dataset Descriptions in the Life Sciences
Supporting Dataset Descriptions in the Life SciencesSupporting Dataset Descriptions in the Life Sciences
Supporting Dataset Descriptions in the Life Sciences
 
Semantics and linked data at astra zeneca
Semantics and linked data at astra zenecaSemantics and linked data at astra zeneca
Semantics and linked data at astra zeneca
 
TFF2016, Rudi Studer, Smarte Dienstleistungen mit semantischen Technologien
TFF2016, Rudi Studer, Smarte Dienstleistungen mit semantischen TechnologienTFF2016, Rudi Studer, Smarte Dienstleistungen mit semantischen Technologien
TFF2016, Rudi Studer, Smarte Dienstleistungen mit semantischen Technologien
 
Challenges in Analytics for BIG Data
Challenges in Analytics for BIG DataChallenges in Analytics for BIG Data
Challenges in Analytics for BIG Data
 
The Rocky Road to Reuse
The Rocky Road to ReuseThe Rocky Road to Reuse
The Rocky Road to Reuse
 
sers, Applications and the Community of Practice for the Air Quality Scenario
sers, Applications and the Community of Practice for the Air Quality Scenariosers, Applications and the Community of Practice for the Air Quality Scenario
sers, Applications and the Community of Practice for the Air Quality Scenario
 
2008-05-05 GEOSS UIC-ADC AQ Scen W shop Toronto
2008-05-05 GEOSS UIC-ADC AQ Scen W shop Toronto2008-05-05 GEOSS UIC-ADC AQ Scen W shop Toronto
2008-05-05 GEOSS UIC-ADC AQ Scen W shop Toronto
 
dkNET Annual Meeting - June 2017
dkNET Annual Meeting - June 2017dkNET Annual Meeting - June 2017
dkNET Annual Meeting - June 2017
 
What infrastructure is necessary for successful research data management (RDM...
What infrastructure is necessary for successful research data management (RDM...What infrastructure is necessary for successful research data management (RDM...
What infrastructure is necessary for successful research data management (RDM...
 
Linked dataresearch
Linked dataresearchLinked dataresearch
Linked dataresearch
 
Interlinking Standardized OpenStreetMap Data and Citizen Science Data in the ...
Interlinking Standardized OpenStreetMap Data and Citizen Science Data in the ...Interlinking Standardized OpenStreetMap Data and Citizen Science Data in the ...
Interlinking Standardized OpenStreetMap Data and Citizen Science Data in the ...
 
How Does Data Science Impact the Semantic Web?
How Does Data Science Impact the Semantic Web?How Does Data Science Impact the Semantic Web?
How Does Data Science Impact the Semantic Web?
 
Metadata as Standard: improving Interoperability through the Research Data Al...
Metadata as Standard: improving Interoperability through the Research Data Al...Metadata as Standard: improving Interoperability through the Research Data Al...
Metadata as Standard: improving Interoperability through the Research Data Al...
 

More from André Valdestilhas

NaturalMSEQueries_presICWI2023.pdf
NaturalMSEQueries_presICWI2023.pdfNaturalMSEQueries_presICWI2023.pdf
NaturalMSEQueries_presICWI2023.pdfAndré Valdestilhas
 
Cedal slides. Web Inteligence 2017
Cedal slides. Web Inteligence 2017Cedal slides. Web Inteligence 2017
Cedal slides. Web Inteligence 2017André Valdestilhas
 
Iswc 2016 completeness correctude
Iswc 2016 completeness correctudeIswc 2016 completeness correctude
Iswc 2016 completeness correctudeAndré Valdestilhas
 
Emotion-oriented computing: Possible uses and resources
Emotion-oriented computing: Possible uses and resourcesEmotion-oriented computing: Possible uses and resources
Emotion-oriented computing: Possible uses and resourcesAndré Valdestilhas
 
Emotion-oriented computing: Possible uses and applications
Emotion-oriented computing: Possible uses and  applicationsEmotion-oriented computing: Possible uses and  applications
Emotion-oriented computing: Possible uses and applicationsAndré Valdestilhas
 
Um estudo sobre localização de serviços sensíveis ao contexto para Televisão ...
Um estudo sobre localização de serviços sensíveis ao contexto para Televisão ...Um estudo sobre localização de serviços sensíveis ao contexto para Televisão ...
Um estudo sobre localização de serviços sensíveis ao contexto para Televisão ...André Valdestilhas
 

More from André Valdestilhas (11)

NaturalMSEQueries_presICWI2023.pdf
NaturalMSEQueries_presICWI2023.pdfNaturalMSEQueries_presICWI2023.pdf
NaturalMSEQueries_presICWI2023.pdf
 
Presentation MSE 2022
Presentation MSE 2022Presentation MSE 2022
Presentation MSE 2022
 
Cedal slides. Web Inteligence 2017
Cedal slides. Web Inteligence 2017Cedal slides. Web Inteligence 2017
Cedal slides. Web Inteligence 2017
 
WASOTA (poster)
WASOTA (poster)WASOTA (poster)
WASOTA (poster)
 
Iswc 2016 completeness correctude
Iswc 2016 completeness correctudeIswc 2016 completeness correctude
Iswc 2016 completeness correctude
 
DBpediaSameAs
DBpediaSameAsDBpediaSameAs
DBpediaSameAs
 
DBpediaSameAs
DBpediaSameAsDBpediaSameAs
DBpediaSameAs
 
Emotion-oriented computing: Possible uses and resources
Emotion-oriented computing: Possible uses and resourcesEmotion-oriented computing: Possible uses and resources
Emotion-oriented computing: Possible uses and resources
 
Using semiotic profile
Using semiotic profileUsing semiotic profile
Using semiotic profile
 
Emotion-oriented computing: Possible uses and applications
Emotion-oriented computing: Possible uses and  applicationsEmotion-oriented computing: Possible uses and  applications
Emotion-oriented computing: Possible uses and applications
 
Um estudo sobre localização de serviços sensíveis ao contexto para Televisão ...
Um estudo sobre localização de serviços sensíveis ao contexto para Televisão ...Um estudo sobre localização de serviços sensíveis ao contexto para Televisão ...
Um estudo sobre localização de serviços sensíveis ao contexto para Televisão ...
 

Recently uploaded

Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...NETWAYS
 
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Pooja Nehwal
 
call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@vikas rana
 
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxGenesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxFamilyWorshipCenterD
 
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)Basil Achie
 
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfOpen Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfhenrik385807
 
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Krijn Poppe
 
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Salam Al-Karadaghi
 
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝soniya singh
 
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...NETWAYS
 
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...NETWAYS
 
SBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSebastiano Panichella
 
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )Pooja Nehwal
 
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSimulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSebastiano Panichella
 
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...NETWAYS
 
Microsoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AIMicrosoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AITatiana Gurgel
 
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfCTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfhenrik385807
 
Work Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptxWork Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptxmavinoikein
 

Recently uploaded (20)

Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
 
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
 
call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@
 
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxGenesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
 
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)
 
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfOpen Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
 
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
 
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
 
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
 
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
 
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
 
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
 
SBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation Track
 
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
 
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSimulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
 
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
 
Microsoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AIMicrosoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AI
 
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfCTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
 
Work Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptxWork Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptx
 

Eswc2018 wimu slides

  • 1. Where is my URI? Andre Valdestilhas1 Tommaso Soru1 Markus Nentwig2 Edgard Marx3 Muhammad Saleem1 Axel-Cyrille Ngonga Ngomo1,4 1 AKSW Group, University of Leipzig, Germany 2 Database Group, University of Leipzig, Germany 3 Leipzig University of Applied Sciences, Germany 4 Data Science Group, Paderborn University, Germany June 6, 2018 Valdestilhas et al. (AKSW) Where is my URI? June 6, 2018 1 / 21
  • 2. Outline Motivation Approach Experiments and Results Conclusions and Future work Valdestilhas et al. (AKSW) Where is my URI? June 6, 2018 2 / 21
  • 3. Motivation endpoint hdt file dump.bz2 file.rdf http://uri.com ... Valdestilhas et al. (AKSW) Where is my URI? June 6, 2018 3 / 21
  • 5. Motivation Use cases Why do we need to know the URI Dataset? Data quality in Link Repositories Regenerate mappings using the CBDs to reapply link discovery algorithms in order to validate the mappings. Part of LinkLion 2.0 Valdestilhas et al. (AKSW) Where is my URI? June 6, 2018 5 / 21
  • 7. Motivation Use cases Federated Query Processing Query planning and Source (dataset) selection WIMU will find relevant sources against the individual triple patterns of a given SPARQL query SELECT ?v1 ?v2 WHERE { ?uri <p1> ?v1. // Triple Pattern 1 (TP1) ?uri <p2> ?v2. // Triple Pattern 2 (TP2) }  Source selection RDF S1 RDF S2 RDF S3 RDF S4 WIMU source selection TP1 = {S1,15},  {S2,9},  {S3,7} TP2 = {S1,20},  {S2,11},  {S4,10} Total selected sources = 6 Valdestilhas et al. (AKSW) Where is my URI? June 6, 2018 7 / 21
  • 8. Approach Goal Index URIs and their use to enable Linked Data consumers to find relevant RDF data sources Rank the datasets proportionally to the number of literals Keep the provenance of the URI Valdestilhas et al. (AKSW) Where is my URI? June 6, 2018 8 / 21
  • 9. Approach Goal Index URIs and their use to enable Linked Data consumers to find relevant RDF data sources Rank the datasets proportionally to the number of literals Keep the provenance of the URI Valdestilhas et al. (AKSW) Where is my URI? June 6, 2018 8 / 21
  • 10. Approach Steps to create the index WIMU Service URI Dataset Score http://... dataset1 3 http://... dataset2 1 http://... datasetN n Dumps & Endpoints LOD Laundromat HDT files URI & Dataset & score Index Retrieve list of datasets from sources. Retrieve data from datasets. SELECT ?s (count (?o) as ?c) WHERE { ?s [] ?o . FILTER(isliteral (?o)) } GROUP BY ?s 1 2 3 Make the index available and browsable via a web application and an API service. 4 5 Build three indexes from the datasets, building the rank Merge the indexes into one. Valdestilhas et al. (AKSW) Where is my URI? June 6, 2018 9 / 21
  • 11. Approach The interface (Web and Json) curl "http://wimu.aksw.org/Find?top=5&uri=http://dbpedia.org/resource/Leipzig" [{"dataset":"http://downloads.dbpedia.org/2016-10/core-i18n/en/infobox_properties_en.ttl.bz2","CountLiteral":"236"}, {"dataset":"http://gaia.infor.uva.es/hdt/dbpedia2015.hdt.gz","CountLiteral":"165"}, {"dataset":"http://download.lodlaundromat.org/a9dabf348fd6262edfbbcf7256b0f839?type=hdt","CountLiteral":"142"}, {"dataset":"http://download.lodlaundromat.org/dde1dcc095b38a1b65ebfbc7696d7998?type=hdt","CountLiteral":"124"}] Valdestilhas et al. (AKSW) Where is my URI? June 6, 2018 10 / 21
  • 12. Approach Usage Service: https://wimu.aksw.org/Find Parameter Default Description top 0 Top occurrences of the datasets uri – URI expected to search link – URL from a linkset cbd – The URI that will be the origin the CBD ds – URL to download the dataset Valdestilhas et al. (AKSW) Where is my URI? June 6, 2018 11 / 21
  • 13. Approach Examples Single URI, top 5 datasets https://wimu.aksw.org/Find?top=5&uri=http: //dbpedia.org/resource/Leipzig Linkset https://wimu.aksw.org/Find?link=http://www.linklion.org/download/ mapping/sws.geonames.org---purl.org.nt CBD http://wimu.aksw.org/Find?cbd=http%3A%2F%2Fciteseer.rkbexplorer. com%2Fid%2Fresource-CS116606&ds=http://download.lodlaundromat. org/b7081efa178bc4ab3ff3a6ef5abac9b2?type=hdt Valdestilhas et al. (AKSW) Where is my URI? June 6, 2018 12 / 21
  • 14. Approach Relevant points Rank the datasets from LODStats and LODLaundromat using a score function is able to process linksets (more than one URI per request) Valdestilhas et al. (AKSW) Where is my URI? June 6, 2018 13 / 21
  • 15. Experiments and results Experimental Setup Hardware: 64 CPU cores, 126 GB RAM, 2 TB hard disk. Creation of the index: 3 days and 7 hours. Valdestilhas et al. (AKSW) Where is my URI? June 6, 2018 14 / 21
  • 16. Experiments and results The heuristic behind the Literals A sample of 100 DBpedia URIs and occurrences of the top-1 datasets according to the number of Literals DBpedia:LODLaundromat::0d75803ad1faf47355a2656821d598bb?type=hdt DBpedia:LODLaundromat::1fdf94a6a8ec729410fe99da03f721d9?type=hdt DBpedia:LODLaundromat::4579a1f4a4637037f1995717b96f7bb5?type=hdt DBpedia:LODLaundromat::7797f921760e5d4a02ee6eca37c951ec?type=hdt DBpedia:LODLaundromat::e8f74020ae3cf760c34af2b20c4451c4?type=hdt Dbpedia:2016:infobox_properties_en.ttl.bz2 DBpedia:RDFHDT:dbpedia2015.hdt.gz 0 10 20 30 40 50 60 70 ype=hdt ype=hdt ype=hdt ype=hdt ype=hdt en.ttl.bz2 15.hdt.gz Datasets Manually checked: 100% precision Valdestilhas et al. (AKSW) Where is my URI? June 6, 2018 15 / 21
  • 17. Experiments and results The heuristic behind the Literals Top 100 datasets from http://dbpedia.org/resource/Leipzig 0 50 100 150 200 250 Literals Valdestilhas et al. (AKSW) Where is my URI? June 6, 2018 16 / 21
  • 18. Experiments and results The heuristic behind the Literals Rank top 5 datasets D1 D2 D3 D4 D5 0 10 20 30 40 50 60 70 80 Literals http://dbpedia.org/resource/Sandersleben D6 D7 D8 D9 D10 0 20 40 60 80 100 120 140 160 180 200 220 240 Literals http://dbpedia.org/resource/Leipzig D1 http://km.aifb.kit.edu/projects/btc-2009/btc-2009-chunk-101.gz D2 http://km.aifb.kit.edu/projects/btc-2012/dbpedia/data-0.nq.gz D3 http://gaia.infor.uva.es/hdt/dbpedia2015.hdt.gz D4 http://data.dws.informatik.uni-mannheim.de/dbpedia/3.8/ru/infobox_properties_en_uris_ru.nt.bz2 D5 http://data.dws.informatik.uni-mannheim.de/dbpedia/3.9/ru/raw_infobox_properties_en_uris_ru.nt.bz2 D6 http://downloads.dbpedia.org/2016-10/core-i18n/en/infobox_properties_en.ttl.bz2 D7 http://gaia.infor.uva.es/hdt/dbpedia2015.hdt.gz D8 http://km.aifb.kit.edu/projects/btc-2009/btc-2009-chunk-061.gz D9 http://data.dws.informatik.uni-mannheim.de/dbpedia/2014/en/infobox_properties_unredirected_en.nq.bz2 D10 http://data.dws.informatik.uni-mannheim.de/dbpedia/2014/en/infobox_properties_en.nq.bz2 Valdestilhas et al. (AKSW) Where is my URI? June 6, 2018 17 / 21
  • 19. Experiments and results Datasets LOD Laundromat LODStats Total URIs indexed 4,185,133,445 31,121,342 4,216,254,787 Datasets checked 658,206 9,960 668,166 Triples processed 19,891,702,202 38,606,408,854 58,498,111,056 LODStats 60% offline, 14% empty 8% triples with literals as objects are blank nodes 35% online datasets present some errors using Jena 69.8% datasets with parser errors LODLaundromat 2.3% parsing errors. 99% indexed by WIMU Valdestilhas et al. (AKSW) Where is my URI? June 6, 2018 18 / 21
  • 20. Experiments and results Datasets Content negotiation error from Content-type: text/html. Error making the query, see cause for details null Service URL overrides the 'query' SPARQL protocol paramenter XMLStreamException: ParseError ar [row,col]: [-8251091,8] Failed when initializing the StAX parsing engine JENA parsing errors Not all datasets are ready to use Valdestilhas et al. (AKSW) Where is my URI? June 6, 2018 19 / 21
  • 21. Conclusions and Future work A regularly updated database index of more than 660K datasets from LODStats and LODLaundromat An efficient service on the web that can determine which dataset will most likely define a URI Various statistics of datasets indexed from LODStats and LODLaundromat Future work: Integrate the second version of LinkLion http://www.linklion.org Valdestilhas et al. (AKSW) Where is my URI? June 6, 2018 20 / 21
  • 22. That’s all Folks! Thanks! Questions?Website: http://wimu.aksw.org/ Contact: valdestilhas@informatik.uni-leipzig.de This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227). Valdestilhas et al. (AKSW) Where is my URI? June 6, 2018 21 / 21