SlideShare a Scribd company logo
DBpediaNYD –
A Silver Standard Benchmark Dataset
for Semantic Relatedness in DBpedia

10/22/13 Paulheim Heiko Paulheim
Heiko

1
Motivation
•

There are quite a few approaches to entity ranking/
statement weighting on Linked Data
– and DBpedia in particular

•

Examples:
– Franz et al. (2009) – Tensor Decomposition
– Meij et al. (2009) – Machine Learning
– Mirizzi et al. (2010) – Web Search Engines
– Mulay and Kumar (2011) – Machine Learning
– Hees et al. (2012) – Crowd Sourcing
– Nunes et al. (2012) – Social Network Analysis

10/22/13

Heiko Paulheim

2
Motivation
•

However,
– none of those have been competitively evaluated
– none of those have been evaluated at large scale

•

Evaluation with
– small private data sets
– user studies

•

Approaches using Machine Learning
– requires training data
– expensive to obtain

10/22/13

Heiko Paulheim

3
The Dataset
•

Large-scale dataset (several thousand instances)
– statements with strengths

•

Strength value: Normalized Google Distance

•

f(x): number of search results containing x

•

f(x,y): number of search results containing both x and y

•

M: number of pages in search engine index

•

NGD has been shown to correlate with human strength associations

10/22/13

Heiko Paulheim

4
The Dataset
•

NGD is a symmetric value
– NYD dataset also contains asymmetric values

•

Asymmetric Normalized Google Distance

•

f(x): number of search results containing x

•

f(x,y): number of search results containing both x and y

•

M: number of pages in search engine index

10/22/13

Heiko Paulheim

5
Constructing the Dataset
•

We sampled 10,000 statements
– with DBpedia resources as subject and object
(e.g., no type statements, no literals)
– with dbpedia or dbpprop predicate

•

...and computed symmetric/asymmetric NGD
– using the labels as search strings
– using Yahoo BOSS

10/22/13

Heiko Paulheim

6
The Dataset
•

Random sample of 10,000 statements
– i.e., 30,000 search engine calls (80c/1,000 → 24 USD)

•

3,058 pairs of resources had to be discarded
– f(x)<f(x,y) or f(y)<f(x,y)
– search engines sometimes don't count properly :-(

•

Result:
– 6,942 weighted statements (symmetric)
– 13,884 weighted statements (asymmetric)

10/22/13

Heiko Paulheim

7
The Dataset
•

Example:
– dbpedia:John_Lennon and dbpedia:Yoko_Ono

•

Distances:
– symmetric: 0.18
– John Lennon → Yoko Ono 0.18
– Yoko Ono → John Lennon 0.03

•

Explanation:
– Yoko Ono is famous for being John Lennon's wife
• and most often mentioned in that context
– John Lennon is more famous for being a member of the Beatles

10/22/13

Heiko Paulheim

8
Example: the DBpedia FindRelated Service
•

We trained two regression SVMs (LibSVM) based on DBpediaNYD
– one for symmetric, one for asymmetric
– service allows for finding the most related among the linked resources

•

Example results:

•

http://wiki.dbpedia.org/FindRelated

10/22/13

Heiko Paulheim

9
Conclusion and Outlook
•

DBpediaNYD allows for large scale evaluation
– rather a silver standard
– does not replace manually created gold standards

•

Future work
– validate DBpediaNYD with users
– compare search engines

10/22/13

Heiko Paulheim

10
Something Completely Different
•

Challenges enumerated in the workshop intro this morning
– “Logical inference on noisy data”

•

Talk on “Type Inference on Noisy RDF Data”
– Was actually applied for DBpedia 3.9
– Friday, 3:15, Bayside 204A

10/22/13

Heiko Paulheim

11
DBpediaNYD –
A Silver Standard Benchmark Dataset
for Semantic Relatedness in DBpedia

10/22/13 Paulheim Heiko Paulheim
Heiko

12

More Related Content

What's hot

Similarity: Retrieving Documents
Similarity: Retrieving DocumentsSimilarity: Retrieving Documents
Similarity: Retrieving Documents
Learnbay Datascience
 
2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload
Prof. Wim Van Criekinge
 
Connections that work: Linked Open Data demystified
Connections that work: Linked Open Data demystifiedConnections that work: Linked Open Data demystified
Connections that work: Linked Open Data demystified
Jakob .
 
Freedom for bibliographic references: OpenCitations arise
Freedom for bibliographic references: OpenCitations ariseFreedom for bibliographic references: OpenCitations arise
Freedom for bibliographic references: OpenCitations arise
University of Bologna
 
PhyloTastic: names-based phyloinformatic data integration
PhyloTastic: names-based phyloinformatic data integrationPhyloTastic: names-based phyloinformatic data integration
PhyloTastic: names-based phyloinformatic data integration
Rutger Vos
 
Dbd arrrrcamp-2013
Dbd arrrrcamp-2013Dbd arrrrcamp-2013
Dbd arrrrcamp-2013
Peter Vandenabeele
 

What's hot (6)

Similarity: Retrieving Documents
Similarity: Retrieving DocumentsSimilarity: Retrieving Documents
Similarity: Retrieving Documents
 
2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload
 
Connections that work: Linked Open Data demystified
Connections that work: Linked Open Data demystifiedConnections that work: Linked Open Data demystified
Connections that work: Linked Open Data demystified
 
Freedom for bibliographic references: OpenCitations arise
Freedom for bibliographic references: OpenCitations ariseFreedom for bibliographic references: OpenCitations arise
Freedom for bibliographic references: OpenCitations arise
 
PhyloTastic: names-based phyloinformatic data integration
PhyloTastic: names-based phyloinformatic data integrationPhyloTastic: names-based phyloinformatic data integration
PhyloTastic: names-based phyloinformatic data integration
 
Dbd arrrrcamp-2013
Dbd arrrrcamp-2013Dbd arrrrcamp-2013
Dbd arrrrcamp-2013
 

Viewers also liked

Using DBpedia for Thesaurus Management and Linked Open Data Integration
Using DBpedia for Thesaurus Management and Linked Open Data IntegrationUsing DBpedia for Thesaurus Management and Linked Open Data Integration
Using DBpedia for Thesaurus Management and Linked Open Data Integration
Martin Kaltenböck
 
Portails documentaires et référentiels du Web sémantique : exemples et enjeu...
Portails documentaires et  référentiels du Web sémantique : exemples et enjeu...Portails documentaires et  référentiels du Web sémantique : exemples et enjeu...
Portails documentaires et référentiels du Web sémantique : exemples et enjeu...
Alexandre Monnin
 
Les quatre aveugles et l'éléphant web, ou les chroniques d'un web non documen...
Les quatre aveugles et l'éléphant web, ou les chroniques d'un web non documen...Les quatre aveugles et l'éléphant web, ou les chroniques d'un web non documen...
Les quatre aveugles et l'éléphant web, ou les chroniques d'un web non documen...ADBS
 
JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...
JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...
JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...
GUANGYUAN PIAO
 
Requêtes sparql
Requêtes sparqlRequêtes sparql
Requêtes sparqlFipBast
 
Les quatre aveugles et l'éléphant web, ou les chroniques d'un web non documen...
Les quatre aveugles et l'éléphant web, ou les chroniques d'un web non documen...Les quatre aveugles et l'éléphant web, ou les chroniques d'un web non documen...
Les quatre aveugles et l'éléphant web, ou les chroniques d'un web non documen...ADBS
 
Lancement de Semanticpédia et DBpédia.fr
Lancement de Semanticpédia et DBpédia.frLancement de Semanticpédia et DBpédia.fr
Lancement de Semanticpédia et DBpédia.fr
Fabien Gandon
 
Thérèse Libourel, atelier Ontologies avec Protégé
Thérèse Libourel, atelier Ontologies avec ProtégéThérèse Libourel, atelier Ontologies avec Protégé
Thérèse Libourel, atelier Ontologies avec Protégé
UMR 7324 CITERES - Laboratoire Archéologie et Territoires, Tours
 
Thérèse Libourel, Ontologies en SHS, 2015-11-09, Tours
Thérèse Libourel, Ontologies en SHS, 2015-11-09, ToursThérèse Libourel, Ontologies en SHS, 2015-11-09, Tours
Thérèse Libourel, Ontologies en SHS, 2015-11-09, Tours
UMR 7324 CITERES - Laboratoire Archéologie et Territoires, Tours
 

Viewers also liked (9)

Using DBpedia for Thesaurus Management and Linked Open Data Integration
Using DBpedia for Thesaurus Management and Linked Open Data IntegrationUsing DBpedia for Thesaurus Management and Linked Open Data Integration
Using DBpedia for Thesaurus Management and Linked Open Data Integration
 
Portails documentaires et référentiels du Web sémantique : exemples et enjeu...
Portails documentaires et  référentiels du Web sémantique : exemples et enjeu...Portails documentaires et  référentiels du Web sémantique : exemples et enjeu...
Portails documentaires et référentiels du Web sémantique : exemples et enjeu...
 
Les quatre aveugles et l'éléphant web, ou les chroniques d'un web non documen...
Les quatre aveugles et l'éléphant web, ou les chroniques d'un web non documen...Les quatre aveugles et l'éléphant web, ou les chroniques d'un web non documen...
Les quatre aveugles et l'éléphant web, ou les chroniques d'un web non documen...
 
JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...
JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...
JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...
 
Requêtes sparql
Requêtes sparqlRequêtes sparql
Requêtes sparql
 
Les quatre aveugles et l'éléphant web, ou les chroniques d'un web non documen...
Les quatre aveugles et l'éléphant web, ou les chroniques d'un web non documen...Les quatre aveugles et l'éléphant web, ou les chroniques d'un web non documen...
Les quatre aveugles et l'éléphant web, ou les chroniques d'un web non documen...
 
Lancement de Semanticpédia et DBpédia.fr
Lancement de Semanticpédia et DBpédia.frLancement de Semanticpédia et DBpédia.fr
Lancement de Semanticpédia et DBpédia.fr
 
Thérèse Libourel, atelier Ontologies avec Protégé
Thérèse Libourel, atelier Ontologies avec ProtégéThérèse Libourel, atelier Ontologies avec Protégé
Thérèse Libourel, atelier Ontologies avec Protégé
 
Thérèse Libourel, Ontologies en SHS, 2015-11-09, Tours
Thérèse Libourel, Ontologies en SHS, 2015-11-09, ToursThérèse Libourel, Ontologies en SHS, 2015-11-09, Tours
Thérèse Libourel, Ontologies en SHS, 2015-11-09, Tours
 

Similar to DBpediaNYD - A Silver Standard Benchmark Dataset for Semantic Relatedness in DBpedia

Data_Science.ppt
Data_Science.pptData_Science.ppt
Data_Science.ppt
ANGADPRAJAPATI3
 
Where is my data (in the cloud) tamir dresher
Where is my data (in the cloud)   tamir dresherWhere is my data (in the cloud)   tamir dresher
Where is my data (in the cloud) tamir dresher
Tamir Dresher
 
Where is my data (in the cloud) tamir dresher
Where is my data (in the cloud)   tamir dresherWhere is my data (in the cloud)   tamir dresher
Where is my data (in the cloud) tamir dresherTamir Dresher
 
Where is my data (in the cloud) tamir dresher
Where is my data (in the cloud)   tamir dresherWhere is my data (in the cloud)   tamir dresher
Where is my data (in the cloud) tamir dresher
Tamir Dresher
 
Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10
Jeroen Rombouts
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data Management
Sarah Jones
 
Identifying Wrong Links between Datasets by Multi-dimensional Outlier Detection
Identifying Wrong Links between Datasets by Multi-dimensional Outlier DetectionIdentifying Wrong Links between Datasets by Multi-dimensional Outlier Detection
Identifying Wrong Links between Datasets by Multi-dimensional Outlier Detection
Heiko Paulheim
 
Big Data Real Time Training in Chennai
Big Data Real Time Training in ChennaiBig Data Real Time Training in Chennai
Big Data Real Time Training in Chennai
Vijay Susheedran C G
 
Big Data 101 - An introduction
Big Data 101 - An introductionBig Data 101 - An introduction
Big Data 101 - An introductionNeeraj Tewari
 
Ordering the chaos: Creating websites with imperfect data
Ordering the chaos: Creating websites with imperfect dataOrdering the chaos: Creating websites with imperfect data
Ordering the chaos: Creating websites with imperfect data
Andy Stretton
 
Research Lifecycles and RDM
Research Lifecycles and RDMResearch Lifecycles and RDM
Research Lifecycles and RDM
Marieke Guy
 
Quettra Design Problem Solution - Deepti Chafekar
Quettra Design Problem Solution - Deepti ChafekarQuettra Design Problem Solution - Deepti Chafekar
Quettra Design Problem Solution - Deepti Chafekar
quettra
 
DS2014: Feature selection in hierarchical feature spaces
DS2014: Feature selection in hierarchical feature spacesDS2014: Feature selection in hierarchical feature spaces
DS2014: Feature selection in hierarchical feature spaces
Petar Ristoski
 
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talkDistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
Gezim Sejdiu
 
week1-thursday-2id50-q2-2021-2022-intro-and-basic-fd.ppt
week1-thursday-2id50-q2-2021-2022-intro-and-basic-fd.pptweek1-thursday-2id50-q2-2021-2022-intro-and-basic-fd.ppt
week1-thursday-2id50-q2-2021-2022-intro-and-basic-fd.ppt
RidoVercascade
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Trey Grainger
 
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Lucidworks
 
Datamininglecture
DatamininglectureDatamininglecture
Datamininglecture
Manish Rana
 
Detection of Related Semantic Datasets Based on Frequent Subgraph Mining
Detection of Related Semantic Datasets Based on Frequent Subgraph MiningDetection of Related Semantic Datasets Based on Frequent Subgraph Mining
Detection of Related Semantic Datasets Based on Frequent Subgraph Mining
Mikel Emaldi Manrique
 
data mining
data miningdata mining
data mining
nehaanand123
 

Similar to DBpediaNYD - A Silver Standard Benchmark Dataset for Semantic Relatedness in DBpedia (20)

Data_Science.ppt
Data_Science.pptData_Science.ppt
Data_Science.ppt
 
Where is my data (in the cloud) tamir dresher
Where is my data (in the cloud)   tamir dresherWhere is my data (in the cloud)   tamir dresher
Where is my data (in the cloud) tamir dresher
 
Where is my data (in the cloud) tamir dresher
Where is my data (in the cloud)   tamir dresherWhere is my data (in the cloud)   tamir dresher
Where is my data (in the cloud) tamir dresher
 
Where is my data (in the cloud) tamir dresher
Where is my data (in the cloud)   tamir dresherWhere is my data (in the cloud)   tamir dresher
Where is my data (in the cloud) tamir dresher
 
Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data Management
 
Identifying Wrong Links between Datasets by Multi-dimensional Outlier Detection
Identifying Wrong Links between Datasets by Multi-dimensional Outlier DetectionIdentifying Wrong Links between Datasets by Multi-dimensional Outlier Detection
Identifying Wrong Links between Datasets by Multi-dimensional Outlier Detection
 
Big Data Real Time Training in Chennai
Big Data Real Time Training in ChennaiBig Data Real Time Training in Chennai
Big Data Real Time Training in Chennai
 
Big Data 101 - An introduction
Big Data 101 - An introductionBig Data 101 - An introduction
Big Data 101 - An introduction
 
Ordering the chaos: Creating websites with imperfect data
Ordering the chaos: Creating websites with imperfect dataOrdering the chaos: Creating websites with imperfect data
Ordering the chaos: Creating websites with imperfect data
 
Research Lifecycles and RDM
Research Lifecycles and RDMResearch Lifecycles and RDM
Research Lifecycles and RDM
 
Quettra Design Problem Solution - Deepti Chafekar
Quettra Design Problem Solution - Deepti ChafekarQuettra Design Problem Solution - Deepti Chafekar
Quettra Design Problem Solution - Deepti Chafekar
 
DS2014: Feature selection in hierarchical feature spaces
DS2014: Feature selection in hierarchical feature spacesDS2014: Feature selection in hierarchical feature spaces
DS2014: Feature selection in hierarchical feature spaces
 
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talkDistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
 
week1-thursday-2id50-q2-2021-2022-intro-and-basic-fd.ppt
week1-thursday-2id50-q2-2021-2022-intro-and-basic-fd.pptweek1-thursday-2id50-q2-2021-2022-intro-and-basic-fd.ppt
week1-thursday-2id50-q2-2021-2022-intro-and-basic-fd.ppt
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
 
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
 
Datamininglecture
DatamininglectureDatamininglecture
Datamininglecture
 
Detection of Related Semantic Datasets Based on Frequent Subgraph Mining
Detection of Related Semantic Datasets Based on Frequent Subgraph MiningDetection of Related Semantic Datasets Based on Frequent Subgraph Mining
Detection of Related Semantic Datasets Based on Frequent Subgraph Mining
 
data mining
data miningdata mining
data mining
 

More from Heiko Paulheim

Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...
Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
Heiko Paulheim
 
What_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfWhat_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdf
Heiko Paulheim
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
Heiko Paulheim
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
Heiko Paulheim
 
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsKnowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Heiko Paulheim
 
From Wikis to Knowledge Graphs
From Wikis to Knowledge GraphsFrom Wikis to Knowledge Graphs
From Wikis to Knowledge Graphs
Heiko Paulheim
 
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Heiko Paulheim
 
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids  on the Knowledge Graph BlockBeyond DBpedia and YAGO – The New Kids  on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Heiko Paulheim
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Heiko Paulheim
 
Machine Learning & Embeddings for Large Knowledge Graphs
Machine Learning & Embeddings  for Large Knowledge GraphsMachine Learning & Embeddings  for Large Knowledge Graphs
Machine Learning & Embeddings for Large Knowledge Graphs
Heiko Paulheim
 
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge GraphFrom Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
Heiko Paulheim
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Heiko Paulheim
 
Make Embeddings Semantic Again!
Make Embeddings Semantic Again!Make Embeddings Semantic Again!
Make Embeddings Semantic Again!
Heiko Paulheim
 
How much is a Triple?
How much is a Triple?How much is a Triple?
How much is a Triple?
Heiko Paulheim
 
Machine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge GraphsMachine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge Graphs
Heiko Paulheim
 
Weakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on TwitterWeakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on Twitter
Heiko Paulheim
 
Towards Knowledge Graph Profiling
Towards Knowledge Graph ProfilingTowards Knowledge Graph Profiling
Towards Knowledge Graph Profiling
Heiko Paulheim
 
Knowledge Graphs on the Web
Knowledge Graphs on the WebKnowledge Graphs on the Web
Knowledge Graphs on the Web
Heiko Paulheim
 
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and OntologyData-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
Heiko Paulheim
 
Fast Approximate A-box Consistency Checking using Machine Learning
Fast Approximate  A-box Consistency Checking using Machine LearningFast Approximate  A-box Consistency Checking using Machine Learning
Fast Approximate A-box Consistency Checking using Machine Learning
Heiko Paulheim
 

More from Heiko Paulheim (20)

Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...
Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
 
What_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfWhat_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdf
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
 
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsKnowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
 
From Wikis to Knowledge Graphs
From Wikis to Knowledge GraphsFrom Wikis to Knowledge Graphs
From Wikis to Knowledge Graphs
 
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
 
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids  on the Knowledge Graph BlockBeyond DBpedia and YAGO – The New Kids  on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
 
Machine Learning & Embeddings for Large Knowledge Graphs
Machine Learning & Embeddings  for Large Knowledge GraphsMachine Learning & Embeddings  for Large Knowledge Graphs
Machine Learning & Embeddings for Large Knowledge Graphs
 
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge GraphFrom Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
 
Make Embeddings Semantic Again!
Make Embeddings Semantic Again!Make Embeddings Semantic Again!
Make Embeddings Semantic Again!
 
How much is a Triple?
How much is a Triple?How much is a Triple?
How much is a Triple?
 
Machine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge GraphsMachine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge Graphs
 
Weakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on TwitterWeakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on Twitter
 
Towards Knowledge Graph Profiling
Towards Knowledge Graph ProfilingTowards Knowledge Graph Profiling
Towards Knowledge Graph Profiling
 
Knowledge Graphs on the Web
Knowledge Graphs on the WebKnowledge Graphs on the Web
Knowledge Graphs on the Web
 
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and OntologyData-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
 
Fast Approximate A-box Consistency Checking using Machine Learning
Fast Approximate  A-box Consistency Checking using Machine LearningFast Approximate  A-box Consistency Checking using Machine Learning
Fast Approximate A-box Consistency Checking using Machine Learning
 

Recently uploaded

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
Vlad Stirbu
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 

Recently uploaded (20)

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 

DBpediaNYD - A Silver Standard Benchmark Dataset for Semantic Relatedness in DBpedia

  • 1. DBpediaNYD – A Silver Standard Benchmark Dataset for Semantic Relatedness in DBpedia 10/22/13 Paulheim Heiko Paulheim Heiko 1
  • 2. Motivation • There are quite a few approaches to entity ranking/ statement weighting on Linked Data – and DBpedia in particular • Examples: – Franz et al. (2009) – Tensor Decomposition – Meij et al. (2009) – Machine Learning – Mirizzi et al. (2010) – Web Search Engines – Mulay and Kumar (2011) – Machine Learning – Hees et al. (2012) – Crowd Sourcing – Nunes et al. (2012) – Social Network Analysis 10/22/13 Heiko Paulheim 2
  • 3. Motivation • However, – none of those have been competitively evaluated – none of those have been evaluated at large scale • Evaluation with – small private data sets – user studies • Approaches using Machine Learning – requires training data – expensive to obtain 10/22/13 Heiko Paulheim 3
  • 4. The Dataset • Large-scale dataset (several thousand instances) – statements with strengths • Strength value: Normalized Google Distance • f(x): number of search results containing x • f(x,y): number of search results containing both x and y • M: number of pages in search engine index • NGD has been shown to correlate with human strength associations 10/22/13 Heiko Paulheim 4
  • 5. The Dataset • NGD is a symmetric value – NYD dataset also contains asymmetric values • Asymmetric Normalized Google Distance • f(x): number of search results containing x • f(x,y): number of search results containing both x and y • M: number of pages in search engine index 10/22/13 Heiko Paulheim 5
  • 6. Constructing the Dataset • We sampled 10,000 statements – with DBpedia resources as subject and object (e.g., no type statements, no literals) – with dbpedia or dbpprop predicate • ...and computed symmetric/asymmetric NGD – using the labels as search strings – using Yahoo BOSS 10/22/13 Heiko Paulheim 6
  • 7. The Dataset • Random sample of 10,000 statements – i.e., 30,000 search engine calls (80c/1,000 → 24 USD) • 3,058 pairs of resources had to be discarded – f(x)<f(x,y) or f(y)<f(x,y) – search engines sometimes don't count properly :-( • Result: – 6,942 weighted statements (symmetric) – 13,884 weighted statements (asymmetric) 10/22/13 Heiko Paulheim 7
  • 8. The Dataset • Example: – dbpedia:John_Lennon and dbpedia:Yoko_Ono • Distances: – symmetric: 0.18 – John Lennon → Yoko Ono 0.18 – Yoko Ono → John Lennon 0.03 • Explanation: – Yoko Ono is famous for being John Lennon's wife • and most often mentioned in that context – John Lennon is more famous for being a member of the Beatles 10/22/13 Heiko Paulheim 8
  • 9. Example: the DBpedia FindRelated Service • We trained two regression SVMs (LibSVM) based on DBpediaNYD – one for symmetric, one for asymmetric – service allows for finding the most related among the linked resources • Example results: • http://wiki.dbpedia.org/FindRelated 10/22/13 Heiko Paulheim 9
  • 10. Conclusion and Outlook • DBpediaNYD allows for large scale evaluation – rather a silver standard – does not replace manually created gold standards • Future work – validate DBpediaNYD with users – compare search engines 10/22/13 Heiko Paulheim 10
  • 11. Something Completely Different • Challenges enumerated in the workshop intro this morning – “Logical inference on noisy data” • Talk on “Type Inference on Noisy RDF Data” – Was actually applied for DBpedia 3.9 – Friday, 3:15, Bayside 204A 10/22/13 Heiko Paulheim 11
  • 12. DBpediaNYD – A Silver Standard Benchmark Dataset for Semantic Relatedness in DBpedia 10/22/13 Paulheim Heiko Paulheim Heiko 12