SlideShare a Scribd company logo
1 of 1
Download to read offline
DRETA: EXTRACTING RDF FROM WIKITABLES
Emir Muñoz, Aidan Hogan, Alessandra Mileo
National University of Ireland, Galway

WIKITABLE SURVEY

MOTIVATION
TABLE
TAXONOMY:

DISTRIBUTIONS:

QUERY:
SELECT ?player
WHERE {
?player dbp:currentclub dbr:Manchester_United_F.C .
}
player
http://dbpedia.org/resource/David_de_Gea
http://dbpedia.org/resource/Rafael_Pereira_da_Silva_(footballer_born_1990)

RESULTS

http://dbpedia.org/resource/Patrice_Evra
….
http://dbpedia.org/resource/Fabio_Pereira_da_Silva
http://dbpedia.org/resource/Tom_Cleverley
http://dbpedia.org/resource/Darren_Fletcher

… INCOMPLETE RESULTS!

(1) EXTRACTED 34.9 MILLION UNIQUE & NOVEL TRIPLES
FROM 1.14 MILLION WIKITABLES
(8 MACHINES: 4GB RAM, 2.2 GHZ SINGLE CORE; 12 DAYS)

(2) INITIAL EVALUATION:

PROPOSAL

(MANUAL ANNOTATION; THREE JUDGES; 750 TRIPLES EACH)

http://dbpedia.org/resource/Manchester_United_F.C.
dbp:currentclub
http://dbpedia.org/resource/David_de_Gea

dbp:position
http://dbpedia.org/resource/Spain

http://dbpedia.org/resource/Goalkeeper_(association_football)

(3) MACHINE LEARNING CLASSIFIERS:
(CONSENSUS GOLD STANDARD; VARIETY OF FEATURES)

…
http://dbpedia.org/resource/Wayne_Rooney
dbo:birthPlace
dbp:position
http://dbpedia.org/resource/England

http://dbpedia.org/resource/Forward_(association_football)

…
http://dbpedia.org/resource/Fabio_Pereira_da_Silva

dbp:position
http://dbpedia.org/resource/Brazil

SUGGESTED
TRIPLES:

http://dbpedia.org/resource/Defender_(association_football)

(1) dbr:David_de_Gea dbo:birthPlace dbr:Spain .
(2) dbr:Fabio_Pereira_de_Silva dbo:birthPlace dbr:Brazil .
(3) dbr:Fabio_Pereira_de_Silva dbp:currentclub dbr:Manchester_United_F.C .

FROM 1.14 MILLION WIKITABLES:
BAGGING DECISION TREES:
SUPPORT VECTOR MACHINES:

1.14 MILLION WIKITABLES:

7.9 MILLION TRIPLES @81.5% PREC.
15.3 MILLION TRIPLES @72.4% PREC.

DEMO … http://emunoz.org/wikitables
Enabling Networked Knowledge

ACKNOWLEDGEMENTS: This work was funded in part by Science Foundation Ireland under Grant No. SFI/08/CE/I1380 (Lion-2).

More Related Content

More from Emir Muñoz

Reading Group 2014
Reading Group 2014Reading Group 2014
Reading Group 2014
Emir Muñoz
 
WikiTables DERI Talk
WikiTables DERI TalkWikiTables DERI Talk
WikiTables DERI Talk
Emir Muñoz
 

More from Emir Muñoz (11)

A Linked Data-Based Decision Tree Classifier to Review Movies
A Linked Data-Based Decision Tree Classifier to Review MoviesA Linked Data-Based Decision Tree Classifier to Review Movies
A Linked Data-Based Decision Tree Classifier to Review Movies
 
The Philosophical Aspects of Data Modelling
The Philosophical Aspects of Data ModellingThe Philosophical Aspects of Data Modelling
The Philosophical Aspects of Data Modelling
 
Web Intelligence - 2010
Web Intelligence - 2010Web Intelligence - 2010
Web Intelligence - 2010
 
μRaptor: A DOM-based system with appetite for hCard elements
μRaptor: A DOM-based system with appetite for hCard elementsμRaptor: A DOM-based system with appetite for hCard elements
μRaptor: A DOM-based system with appetite for hCard elements
 
Learning Content Patterns from Linked Data
Learning Content Patterns from Linked DataLearning Content Patterns from Linked Data
Learning Content Patterns from Linked Data
 
Claves XML: Una Implementación de Algoritmos de Implicación y Validación
Claves XML: Una Implementación de Algoritmos de Implicación y ValidaciónClaves XML: Una Implementación de Algoritmos de Implicación y Validación
Claves XML: Una Implementación de Algoritmos de Implicación y Validación
 
Using Linked Data to Mine RDF from Wikipedia's Tables
Using Linked Data to Mine RDF from Wikipedia's TablesUsing Linked Data to Mine RDF from Wikipedia's Tables
Using Linked Data to Mine RDF from Wikipedia's Tables
 
Reading Group 2014
Reading Group 2014Reading Group 2014
Reading Group 2014
 
Soft Cardinality Constraints on XML Data
Soft Cardinality Constraints on XML DataSoft Cardinality Constraints on XML Data
Soft Cardinality Constraints on XML Data
 
DEXA 2012 Talk
DEXA 2012 TalkDEXA 2012 Talk
DEXA 2012 Talk
 
WikiTables DERI Talk
WikiTables DERI TalkWikiTables DERI Talk
WikiTables DERI Talk
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Recently uploaded (20)

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation Computing
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software Engineering
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern Enterprise
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

DRETa: Extracting RDF From Wikitables

  • 1. DRETA: EXTRACTING RDF FROM WIKITABLES Emir Muñoz, Aidan Hogan, Alessandra Mileo National University of Ireland, Galway WIKITABLE SURVEY MOTIVATION TABLE TAXONOMY: DISTRIBUTIONS: QUERY: SELECT ?player WHERE { ?player dbp:currentclub dbr:Manchester_United_F.C . } player http://dbpedia.org/resource/David_de_Gea http://dbpedia.org/resource/Rafael_Pereira_da_Silva_(footballer_born_1990) RESULTS http://dbpedia.org/resource/Patrice_Evra …. http://dbpedia.org/resource/Fabio_Pereira_da_Silva http://dbpedia.org/resource/Tom_Cleverley http://dbpedia.org/resource/Darren_Fletcher … INCOMPLETE RESULTS! (1) EXTRACTED 34.9 MILLION UNIQUE & NOVEL TRIPLES FROM 1.14 MILLION WIKITABLES (8 MACHINES: 4GB RAM, 2.2 GHZ SINGLE CORE; 12 DAYS) (2) INITIAL EVALUATION: PROPOSAL (MANUAL ANNOTATION; THREE JUDGES; 750 TRIPLES EACH) http://dbpedia.org/resource/Manchester_United_F.C. dbp:currentclub http://dbpedia.org/resource/David_de_Gea dbp:position http://dbpedia.org/resource/Spain http://dbpedia.org/resource/Goalkeeper_(association_football) (3) MACHINE LEARNING CLASSIFIERS: (CONSENSUS GOLD STANDARD; VARIETY OF FEATURES) … http://dbpedia.org/resource/Wayne_Rooney dbo:birthPlace dbp:position http://dbpedia.org/resource/England http://dbpedia.org/resource/Forward_(association_football) … http://dbpedia.org/resource/Fabio_Pereira_da_Silva dbp:position http://dbpedia.org/resource/Brazil SUGGESTED TRIPLES: http://dbpedia.org/resource/Defender_(association_football) (1) dbr:David_de_Gea dbo:birthPlace dbr:Spain . (2) dbr:Fabio_Pereira_de_Silva dbo:birthPlace dbr:Brazil . (3) dbr:Fabio_Pereira_de_Silva dbp:currentclub dbr:Manchester_United_F.C . FROM 1.14 MILLION WIKITABLES: BAGGING DECISION TREES: SUPPORT VECTOR MACHINES: 1.14 MILLION WIKITABLES: 7.9 MILLION TRIPLES @81.5% PREC. 15.3 MILLION TRIPLES @72.4% PREC. DEMO … http://emunoz.org/wikitables Enabling Networked Knowledge ACKNOWLEDGEMENTS: This work was funded in part by Science Foundation Ireland under Grant No. SFI/08/CE/I1380 (Lion-2).