WDPlus: Leveraging Wikidata to Link and Extend Tabular Data

dgarijo
WDPlus:
Leveraging Wikidata to Link and
Extend Tabular Data
Daniel Garijo, Pedro Szekely
Information Sciences Institute and
Department of Computer Science
@dgarijov
dgarijo@isi.edu
Abundance of data sources in the Web
Users of data face three challenges
• How do I find relevant datasets
for my problem?
• How do I augment my dataset
with existing information?
• How can I share my integrated
results with the community?
Daniel Garijo and Pedro Szekely. WDPlus: Leveraging Wikidata to Link and Extend
Tabular Data. AKTS 2019 (eScience 2019)
2
Raw data
Popular initiatives for addressing these challenges
• Search individual items
• Search is manual, based on user input
• LOD cloud of connected datasets
• Knowledge engineers are needed to map and
augment content
• ETL Frameworks (e.g, Karma, Open Refine)
• Pipelines are custom, expertise required
• Often not shared
Daniel Garijo and Pedro Szekely. WDPlus: Leveraging Wikidata to Link and Extend
Tabular Data. AKTS 2019 (eScience 2019)
3
Sources: https://lod-cloud.net/versions/2019-03-29/lod-cloud.png; https://panoply.io/data-warehouse-guide/3-ways-to-build-an-etl-process/
WDPlus
A framework designed to:
• Discover data on the Web
• Improve raw data to make it useful
• Search, querying dataset structure
• Download fresh data
• Combine existing dataset
• Share improved data and methods
Daniel Garijo and Pedro Szekely. WDPlus: Leveraging Wikidata to Link and Extend
Tabular Data. AKTS 2019 (eScience 2019)
4
1953-01-20
Harry Truman
Lamar
President
USA
1884-05-08
1972-12-26
male
1945-04-12
Bress Truman
weath
er
shoppi
ng
crime
sports
Core
Satellites
Metadata index
WDPlus architecture
Wikidata as a core KG
• 60 Million items
• 700 Million statements
• 20,000 + contributors
• +1 billion edits
• Collaborative!
5
1953-01-20
Harry Truman
Lamar
President
USA
1884-05-08
1972-12-26
male
1945-04-12
Bress Truman
Daniel Garijo and Pedro Szekely. WDPlus: Leveraging Wikidata to Link and Extend
Tabular Data. AKTS 2019 (eScience 2019)
Core
WDPlus architecture
Satellite organization
• Detailed information on a domain
• Crime records, sport events, etc.
• Linked to the Wikidata core
• Link first strategy
• Custom properties and Qnodes
• Extensions to core model
• Synchronized with core
• Decentralized
• 1 satellite may be maintained by 1
community
6Daniel Garijo and Pedro Szekely. WDPlus: Leveraging Wikidata to Link and Extend
Tabular Data. AKTS 2019 (eScience 2019)
1953-01-20
Harry Truman
Lamar
President
USA
1884-05-08
1972-12-26
male
1945-04-12
Bress Truman
weath
er
shoppi
ng
crime
sports
Core
Satellites
WDPlus architecture
Table models
• Tables are not materialized
• Able to become a satellite under
demand
• Described in machine-readable
metadata index
• Indexing columns names and
relevant instances for fast retrieval
• Link to table model is preserved
7Daniel Garijo and Pedro Szekely. WDPlus: Leveraging Wikidata to Link and Extend
Tabular Data. AKTS 2019 (eScience 2019)
1953-01-20
Harry Truman
Lamar
President
USA
1884-05-08
1972-12-26
male
1945-04-12
Bress Truman
weath
er
shoppi
ng
crime
sports
Core
Satellites
Metadata index
Towards WDPLus
8Daniel Garijo and Pedro Szekely. WDPlus: Leveraging Wikidata to Link and Extend
Tabular Data. AKTS 2019 (eScience 2019)
1953-01-20
Harry Truman
Lamar
President
USA
1884-05-08
1972-12-26
male
1945-04-12
Bress Truman
weath
er
shoppi
ng
crime
sports
Core
Satellites
Metadata index
WDPlus framework: Metadata index and
table Augmentation
• Search
• Keywords, variables or content
• Wikifier may be used in search
• Download
• Download a dataset or its metadata
• Augment
• Merge your dataset with contents from
other datasets automatically
• Upload
• Add new datasets (automated metadata
profiling and provenance)
• Enrich
• Header enrichment for search efficiency
9Daniel Garijo and Pedro Szekely. WDPlus: Leveraging Wikidata to Link and Extend
Tabular Data. AKTS 2019 (eScience 2019)
WDPlus framework: T2WML
10Daniel Garijo and Pedro Szekely. WDPlus: Leveraging Wikidata to Link and Extend
Tabular Data. AKTS 2019 (eScience 2019)
Table overview
Cell-based mapping.
This mapping is
saved in WDPlus for
future reference
Entity Linking
Result sample
Easy to share!
Creating Wikidata Satellites: Challenges
• Identify new properties to model satellites
• Currently done by hand by Knowledge engineers
• Creation of new Qnodes for satellite instances
• Identified a schema for each satellite
• Feedback loop to Wikidata
• How to select a “trusty” statement when several values are available?
• Namespace issues
• Single namespace, or namespace per satellite?
• Inter-satellite linkages
11Daniel Garijo and Pedro Szekely. WDPlus: Leveraging Wikidata to Link and Extend
Tabular Data. AKTS 2019 (eScience 2019)
Conclusions
• Tabular data exists in heterogeneous formats
• Difficult to find, use, augment and share
• WDPlus is a framework to help discover, improve, search, augment,
combine and share tabular data
• WDPlus framework for profiling and enriching datasets
• T2WML language to generate linked instances from tabular data
• Encouraging early results on usability
12Daniel Garijo and Pedro Szekely. WDPlus: Leveraging Wikidata to Link and Extend
Tabular Data. AKTS 2019 (eScience 2019)
Help us extend WDPlus!
Do you have comments, suggestions or use cases? Contact me at:
Daniel Garijo and Pedro Szekely. WDPlus: Leveraging Wikidata to Link and Extend
Tabular Data. AKTS 2019 (eScience 2019)
13
dgarijo@isi.edu
1 of 13

Recommended

bringing Library and Researcher/Developer communities together to bridge the ... by
bringing Library and Researcher/Developer communities together to bridge the ...bringing Library and Researcher/Developer communities together to bridge the ...
bringing Library and Researcher/Developer communities together to bridge the ...ARDC
303 views12 slides
Earth Science Data Community Perspectives on Data Publication by
Earth Science Data Community Perspectives on Data Publication Earth Science Data Community Perspectives on Data Publication
Earth Science Data Community Perspectives on Data Publication Erin Robinson
638 views10 slides
Seeking serendipity by
Seeking serendipitySeeking serendipity
Seeking serendipityAndrew Treloar
893 views15 slides
Dash for IASSIST 2014 by
Dash for IASSIST 2014Dash for IASSIST 2014
Dash for IASSIST 2014Carly Strasser
938 views32 slides
PLOS ALM Talk on UC3 Services and Altmetrics by
PLOS ALM Talk on UC3 Services and AltmetricsPLOS ALM Talk on UC3 Services and Altmetrics
PLOS ALM Talk on UC3 Services and AltmetricsCarly Strasser
885 views15 slides
Carpenter "The Future of the Scholarly Record" by
Carpenter "The Future of the Scholarly Record"Carpenter "The Future of the Scholarly Record"
Carpenter "The Future of the Scholarly Record"National Information Standards Organization (NISO)
208 views75 slides

More Related Content

What's hot

Endings and new beginnings: update on the Jisc Content programme 2011-13 by
Endings and new beginnings: update on the Jisc Content programme 2011-13 Endings and new beginnings: update on the Jisc Content programme 2011-13
Endings and new beginnings: update on the Jisc Content programme 2011-13 PaolaMarchionni
936 views8 slides
Cornell 2011 05-13 by
Cornell 2011 05-13Cornell 2011 05-13
Cornell 2011 05-13Johannes Keizer
458 views30 slides
Data Management Solutions from Libraries at NSF Large Facilities Workshop by
Data Management Solutions from Libraries at NSF Large Facilities WorkshopData Management Solutions from Libraries at NSF Large Facilities Workshop
Data Management Solutions from Libraries at NSF Large Facilities WorkshopCarly Strasser
655 views26 slides
Big data and the dark arts - Jisc Digital Media 2015 by
Big data and the dark arts - Jisc Digital Media 2015Big data and the dark arts - Jisc Digital Media 2015
Big data and the dark arts - Jisc Digital Media 2015Jisc
4.9K views87 slides
Progress with FITS for analyzing video by
Progress with FITS for analyzing videoProgress with FITS for analyzing video
Progress with FITS for analyzing videoprwheatley
516 views8 slides
Research data spring: clipper by
Research data spring: clipperResearch data spring: clipper
Research data spring: clipperJisc RDM
5.1K views17 slides

What's hot(19)

Endings and new beginnings: update on the Jisc Content programme 2011-13 by PaolaMarchionni
Endings and new beginnings: update on the Jisc Content programme 2011-13 Endings and new beginnings: update on the Jisc Content programme 2011-13
Endings and new beginnings: update on the Jisc Content programme 2011-13
PaolaMarchionni936 views
Data Management Solutions from Libraries at NSF Large Facilities Workshop by Carly Strasser
Data Management Solutions from Libraries at NSF Large Facilities WorkshopData Management Solutions from Libraries at NSF Large Facilities Workshop
Data Management Solutions from Libraries at NSF Large Facilities Workshop
Carly Strasser655 views
Big data and the dark arts - Jisc Digital Media 2015 by Jisc
Big data and the dark arts - Jisc Digital Media 2015Big data and the dark arts - Jisc Digital Media 2015
Big data and the dark arts - Jisc Digital Media 2015
Jisc4.9K views
Progress with FITS for analyzing video by prwheatley
Progress with FITS for analyzing videoProgress with FITS for analyzing video
Progress with FITS for analyzing video
prwheatley516 views
Research data spring: clipper by Jisc RDM
Research data spring: clipperResearch data spring: clipper
Research data spring: clipper
Jisc RDM5.1K views
Web Science Synergies: Exploring Web Knowledge through the Semantic Web by Stefan Dietze
Web Science Synergies: Exploring Web Knowledge through the Semantic WebWeb Science Synergies: Exploring Web Knowledge through the Semantic Web
Web Science Synergies: Exploring Web Knowledge through the Semantic Web
Stefan Dietze764 views
Data management overview and UC3 tools for IASSIST 2014 by Carly Strasser
Data management overview and UC3 tools for IASSIST 2014Data management overview and UC3 tools for IASSIST 2014
Data management overview and UC3 tools for IASSIST 2014
Carly Strasser732 views
NREN 3.0 by Ed Dodds
NREN 3.0NREN 3.0
NREN 3.0
Ed Dodds408 views
DPSY 6121/8121 Week 11 Dscussion by eckchela
DPSY 6121/8121 Week 11 DscussionDPSY 6121/8121 Week 11 Dscussion
DPSY 6121/8121 Week 11 Dscussion
eckchela81 views
Cal Poly - Data Management for Researchers by Carly Strasser
Cal Poly - Data Management for ResearchersCal Poly - Data Management for Researchers
Cal Poly - Data Management for Researchers
Carly Strasser747 views
Open Data Movement around the Globe by Chingteng Hsiao
Open Data Movement around the GlobeOpen Data Movement around the Globe
Open Data Movement around the Globe
Chingteng Hsiao1.8K views
DataViz_What_How_Why by Shweta Gupte
DataViz_What_How_WhyDataViz_What_How_Why
DataViz_What_How_Why
Shweta Gupte165 views
Ink On Our Hands: Plotting the Map of Canada's Integrated Digital Scholarship... by Hamilton Public Library
Ink On Our Hands: Plotting the Map of Canada's Integrated Digital Scholarship...Ink On Our Hands: Plotting the Map of Canada's Integrated Digital Scholarship...
Ink On Our Hands: Plotting the Map of Canada's Integrated Digital Scholarship...
Exploring platform boundary resources with a data-driven approach by Jukka Huhtamäki
Exploring platform boundary resources with a data-driven approachExploring platform boundary resources with a data-driven approach
Exploring platform boundary resources with a data-driven approach
Jukka Huhtamäki873 views
The Tribal Approach Academia Takes to Research Data Management by LIBER Europe
The Tribal Approach Academia Takes to Research Data ManagementThe Tribal Approach Academia Takes to Research Data Management
The Tribal Approach Academia Takes to Research Data Management
LIBER Europe152 views

Similar to WDPlus: Leveraging Wikidata to Link and Extend Tabular Data

Wikidata Introductory Workshop by
Wikidata Introductory WorkshopWikidata Introductory Workshop
Wikidata Introductory WorkshopBeat Estermann
221 views36 slides
Introduction_to_knowledge_graph.pdf by
Introduction_to_knowledge_graph.pdfIntroduction_to_knowledge_graph.pdf
Introduction_to_knowledge_graph.pdfJaberRad1
2 views43 slides
Hahn "Wikidata as a hub to library linked data re-use" by
Hahn "Wikidata as a hub to library linked data re-use"Hahn "Wikidata as a hub to library linked data re-use"
Hahn "Wikidata as a hub to library linked data re-use"National Information Standards Organization (NISO)
459 views32 slides
Knowledge Architecture: Graphing Your Knowledge by
Knowledge Architecture: Graphing Your KnowledgeKnowledge Architecture: Graphing Your Knowledge
Knowledge Architecture: Graphing Your KnowledgeNeo4j
2.9K views30 slides
Hala skafkeynote@conferencedata2021 by
Hala skafkeynote@conferencedata2021Hala skafkeynote@conferencedata2021
Hala skafkeynote@conferencedata2021hala Skaf
113 views143 slides
Beyond Meta-Data: Nano-Publications Recording Scientific Endeavour by
Beyond Meta-Data: Nano-Publications Recording Scientific EndeavourBeyond Meta-Data: Nano-Publications Recording Scientific Endeavour
Beyond Meta-Data: Nano-Publications Recording Scientific EndeavourKNOWeSCAPE2014
714 views53 slides

Similar to WDPlus: Leveraging Wikidata to Link and Extend Tabular Data(20)

Wikidata Introductory Workshop by Beat Estermann
Wikidata Introductory WorkshopWikidata Introductory Workshop
Wikidata Introductory Workshop
Beat Estermann221 views
Introduction_to_knowledge_graph.pdf by JaberRad1
Introduction_to_knowledge_graph.pdfIntroduction_to_knowledge_graph.pdf
Introduction_to_knowledge_graph.pdf
JaberRad12 views
Knowledge Architecture: Graphing Your Knowledge by Neo4j
Knowledge Architecture: Graphing Your KnowledgeKnowledge Architecture: Graphing Your Knowledge
Knowledge Architecture: Graphing Your Knowledge
Neo4j2.9K views
Hala skafkeynote@conferencedata2021 by hala Skaf
Hala skafkeynote@conferencedata2021Hala skafkeynote@conferencedata2021
Hala skafkeynote@conferencedata2021
hala Skaf113 views
Beyond Meta-Data: Nano-Publications Recording Scientific Endeavour by KNOWeSCAPE2014
Beyond Meta-Data: Nano-Publications Recording Scientific EndeavourBeyond Meta-Data: Nano-Publications Recording Scientific Endeavour
Beyond Meta-Data: Nano-Publications Recording Scientific Endeavour
KNOWeSCAPE2014714 views
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena... by Anita de Waard
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Anita de Waard1.6K views
An introduction to the Wikidata Thesis Toolkit / Helen Williams (London Schoo... by CILIP MDG
An introduction to the Wikidata Thesis Toolkit / Helen Williams (London Schoo...An introduction to the Wikidata Thesis Toolkit / Helen Williams (London Schoo...
An introduction to the Wikidata Thesis Toolkit / Helen Williams (London Schoo...
CILIP MDG21 views
Estermann Wikidata and Heritage Data 20170914 by Beat Estermann
Estermann Wikidata and Heritage Data 20170914Estermann Wikidata and Heritage Data 20170914
Estermann Wikidata and Heritage Data 20170914
Beat Estermann571 views
What is eScience, and where does it go from here? by Daniel S. Katz
What is eScience, and where does it go from here?What is eScience, and where does it go from here?
What is eScience, and where does it go from here?
Daniel S. Katz307 views
Research Data Sharing LERU by LIBER Europe
Research Data Sharing LERU Research Data Sharing LERU
Research Data Sharing LERU
LIBER Europe1.3K views
Rober stephenson by NASAPMC
Rober stephensonRober stephenson
Rober stephenson
NASAPMC13.9K views
Keynote: Mark Parsons - Plans are Useless, But Planning is Essential by CASRAI
Keynote: Mark Parsons - Plans are Useless, But Planning is EssentialKeynote: Mark Parsons - Plans are Useless, But Planning is Essential
Keynote: Mark Parsons - Plans are Useless, But Planning is Essential
CASRAI704 views
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science by Optum
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data ScienceAI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
Optum1.4K views
Global identities for a global knowledge by Carlos Iglesias
Global identities for a global knowledgeGlobal identities for a global knowledge
Global identities for a global knowledge
Carlos Iglesias372 views
Driving Innovation with Knowledge Sharing and Open Data by Jeanne Holm
Driving Innovation with Knowledge Sharing and Open DataDriving Innovation with Knowledge Sharing and Open Data
Driving Innovation with Knowledge Sharing and Open Data
Jeanne Holm689 views

More from dgarijo

FOOPS!: An Ontology Pitfall Scanner for the FAIR principles by
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesFOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesdgarijo
516 views8 slides
FAIR Workflows: A step closer to the Scientific Paper of the Future by
FAIR Workflows: A step closer to the Scientific Paper of the FutureFAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the Futuredgarijo
618 views36 slides
Towards Reusable Research Software by
Towards Reusable Research SoftwareTowards Reusable Research Software
Towards Reusable Research Softwaredgarijo
171 views9 slides
SOMEF: a metadata extraction framework from software documentation by
SOMEF: a metadata extraction framework from software documentationSOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentationdgarijo
121 views7 slides
A Template-Based Approach for Annotating Long-Tailed Datasets by
A Template-Based Approach for Annotating Long-Tailed DatasetsA Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed Datasetsdgarijo
144 views12 slides
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs by
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge GraphsOBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphsdgarijo
423 views21 slides

More from dgarijo(20)

FOOPS!: An Ontology Pitfall Scanner for the FAIR principles by dgarijo
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesFOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
dgarijo516 views
FAIR Workflows: A step closer to the Scientific Paper of the Future by dgarijo
FAIR Workflows: A step closer to the Scientific Paper of the FutureFAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the Future
dgarijo618 views
Towards Reusable Research Software by dgarijo
Towards Reusable Research SoftwareTowards Reusable Research Software
Towards Reusable Research Software
dgarijo171 views
SOMEF: a metadata extraction framework from software documentation by dgarijo
SOMEF: a metadata extraction framework from software documentationSOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentation
dgarijo121 views
A Template-Based Approach for Annotating Long-Tailed Datasets by dgarijo
A Template-Based Approach for Annotating Long-Tailed DatasetsA Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed Datasets
dgarijo144 views
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs by dgarijo
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge GraphsOBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
dgarijo423 views
Towards Knowledge Graphs of Reusable Research Software Metadata by dgarijo
Towards Knowledge Graphs of Reusable Research Software MetadataTowards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software Metadata
dgarijo624 views
Scientific Software Registry Collaboration Workshop: From Software Metadata r... by dgarijo
Scientific Software Registry Collaboration Workshop: From Software Metadata r...Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
dgarijo460 views
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M... by dgarijo
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
dgarijo1.8K views
Towards Human-Guided Machine Learning - IUI 2019 by dgarijo
Towards Human-Guided Machine Learning - IUI 2019Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019
dgarijo545 views
Capturing Context in Scientific Experiments: Towards Computer-Driven Science by dgarijo
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
dgarijo551 views
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met... by dgarijo
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
dgarijo583 views
WIDOCO: A Wizard for Documenting Ontologies by dgarijo
WIDOCO: A Wizard for Documenting OntologiesWIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting Ontologies
dgarijo1.2K views
Towards Automating Data Narratives by dgarijo
Towards Automating Data NarrativesTowards Automating Data Narratives
Towards Automating Data Narratives
dgarijo918 views
Automated Hypothesis Testing with Large Scale Scientific Workflows by dgarijo
Automated Hypothesis Testing with Large Scale Scientific WorkflowsAutomated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific Workflows
dgarijo586 views
OntoSoft: A Distributed Semantic Registry for Scientific Software by dgarijo
OntoSoft: A Distributed Semantic Registry for Scientific SoftwareOntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific Software
dgarijo919 views
OEG tools for supporting Ontology Engineering by dgarijo
OEG tools for supporting Ontology EngineeringOEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology Engineering
dgarijo289 views
Software Metadata: Describing "dark software" in GeoSciences by dgarijo
Software Metadata: Describing "dark software" in GeoSciencesSoftware Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciences
dgarijo901 views
Reproducibility Using Semantics: An Overview by dgarijo
Reproducibility Using Semantics: An OverviewReproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An Overview
dgarijo890 views
PhD Thesis: Mining abstractions in scientific workflows by dgarijo
PhD Thesis: Mining abstractions in scientific workflowsPhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflows
dgarijo1.8K views

Recently uploaded

Saikat Chakraborty Java Oracle Certificate.pdf by
Saikat Chakraborty Java Oracle Certificate.pdfSaikat Chakraborty Java Oracle Certificate.pdf
Saikat Chakraborty Java Oracle Certificate.pdfSaikatChakraborty787148
13 views1 slide
SWM L15-L28_drhasan (Part 2).pdf by
SWM L15-L28_drhasan (Part 2).pdfSWM L15-L28_drhasan (Part 2).pdf
SWM L15-L28_drhasan (Part 2).pdfMahmudHasan747870
28 views93 slides
SNMPx by
SNMPxSNMPx
SNMPxAmatullahbutt
12 views12 slides
13_DVD_Latch-up_prevention.pdf by
13_DVD_Latch-up_prevention.pdf13_DVD_Latch-up_prevention.pdf
13_DVD_Latch-up_prevention.pdfUsha Mehta
9 views16 slides
Electronic Devices - Integrated Circuit.pdf by
Electronic Devices - Integrated Circuit.pdfElectronic Devices - Integrated Circuit.pdf
Electronic Devices - Integrated Circuit.pdfbooksarpita
11 views46 slides
Deutsch Crimping by
Deutsch CrimpingDeutsch Crimping
Deutsch CrimpingIwiss Tools Co.,Ltd
19 views7 slides

Recently uploaded(20)

13_DVD_Latch-up_prevention.pdf by Usha Mehta
13_DVD_Latch-up_prevention.pdf13_DVD_Latch-up_prevention.pdf
13_DVD_Latch-up_prevention.pdf
Usha Mehta9 views
Electronic Devices - Integrated Circuit.pdf by booksarpita
Electronic Devices - Integrated Circuit.pdfElectronic Devices - Integrated Circuit.pdf
Electronic Devices - Integrated Circuit.pdf
booksarpita11 views
Informed search algorithms.pptx by Dr.Shweta
Informed search algorithms.pptxInformed search algorithms.pptx
Informed search algorithms.pptx
Dr.Shweta12 views
DevOps to DevSecOps: Enhancing Software Security Throughout The Development L... by Anowar Hossain
DevOps to DevSecOps: Enhancing Software Security Throughout The Development L...DevOps to DevSecOps: Enhancing Software Security Throughout The Development L...
DevOps to DevSecOps: Enhancing Software Security Throughout The Development L...
Anowar Hossain10 views
cloud computing-virtualization.pptx by RajaulKarim20
cloud computing-virtualization.pptxcloud computing-virtualization.pptx
cloud computing-virtualization.pptx
RajaulKarim2082 views
2_DVD_ASIC_Design_FLow.pdf by Usha Mehta
2_DVD_ASIC_Design_FLow.pdf2_DVD_ASIC_Design_FLow.pdf
2_DVD_ASIC_Design_FLow.pdf
Usha Mehta14 views
Literature review and Case study on Commercial Complex in Nepal, Durbar mall,... by AakashShakya12
Literature review and Case study on Commercial Complex in Nepal, Durbar mall,...Literature review and Case study on Commercial Complex in Nepal, Durbar mall,...
Literature review and Case study on Commercial Complex in Nepal, Durbar mall,...
AakashShakya1245 views
CHI-SQUARE ( χ2) TESTS.pptx by ssusera597c5
CHI-SQUARE ( χ2) TESTS.pptxCHI-SQUARE ( χ2) TESTS.pptx
CHI-SQUARE ( χ2) TESTS.pptx
ssusera597c520 views
Multi-objective distributed generation integration in radial distribution sy... by IJECEIAES
Multi-objective distributed generation integration in radial  distribution sy...Multi-objective distributed generation integration in radial  distribution sy...
Multi-objective distributed generation integration in radial distribution sy...
IJECEIAES15 views

WDPlus: Leveraging Wikidata to Link and Extend Tabular Data

  • 1. WDPlus: Leveraging Wikidata to Link and Extend Tabular Data Daniel Garijo, Pedro Szekely Information Sciences Institute and Department of Computer Science @dgarijov dgarijo@isi.edu
  • 2. Abundance of data sources in the Web Users of data face three challenges • How do I find relevant datasets for my problem? • How do I augment my dataset with existing information? • How can I share my integrated results with the community? Daniel Garijo and Pedro Szekely. WDPlus: Leveraging Wikidata to Link and Extend Tabular Data. AKTS 2019 (eScience 2019) 2 Raw data
  • 3. Popular initiatives for addressing these challenges • Search individual items • Search is manual, based on user input • LOD cloud of connected datasets • Knowledge engineers are needed to map and augment content • ETL Frameworks (e.g, Karma, Open Refine) • Pipelines are custom, expertise required • Often not shared Daniel Garijo and Pedro Szekely. WDPlus: Leveraging Wikidata to Link and Extend Tabular Data. AKTS 2019 (eScience 2019) 3 Sources: https://lod-cloud.net/versions/2019-03-29/lod-cloud.png; https://panoply.io/data-warehouse-guide/3-ways-to-build-an-etl-process/
  • 4. WDPlus A framework designed to: • Discover data on the Web • Improve raw data to make it useful • Search, querying dataset structure • Download fresh data • Combine existing dataset • Share improved data and methods Daniel Garijo and Pedro Szekely. WDPlus: Leveraging Wikidata to Link and Extend Tabular Data. AKTS 2019 (eScience 2019) 4 1953-01-20 Harry Truman Lamar President USA 1884-05-08 1972-12-26 male 1945-04-12 Bress Truman weath er shoppi ng crime sports Core Satellites Metadata index
  • 5. WDPlus architecture Wikidata as a core KG • 60 Million items • 700 Million statements • 20,000 + contributors • +1 billion edits • Collaborative! 5 1953-01-20 Harry Truman Lamar President USA 1884-05-08 1972-12-26 male 1945-04-12 Bress Truman Daniel Garijo and Pedro Szekely. WDPlus: Leveraging Wikidata to Link and Extend Tabular Data. AKTS 2019 (eScience 2019) Core
  • 6. WDPlus architecture Satellite organization • Detailed information on a domain • Crime records, sport events, etc. • Linked to the Wikidata core • Link first strategy • Custom properties and Qnodes • Extensions to core model • Synchronized with core • Decentralized • 1 satellite may be maintained by 1 community 6Daniel Garijo and Pedro Szekely. WDPlus: Leveraging Wikidata to Link and Extend Tabular Data. AKTS 2019 (eScience 2019) 1953-01-20 Harry Truman Lamar President USA 1884-05-08 1972-12-26 male 1945-04-12 Bress Truman weath er shoppi ng crime sports Core Satellites
  • 7. WDPlus architecture Table models • Tables are not materialized • Able to become a satellite under demand • Described in machine-readable metadata index • Indexing columns names and relevant instances for fast retrieval • Link to table model is preserved 7Daniel Garijo and Pedro Szekely. WDPlus: Leveraging Wikidata to Link and Extend Tabular Data. AKTS 2019 (eScience 2019) 1953-01-20 Harry Truman Lamar President USA 1884-05-08 1972-12-26 male 1945-04-12 Bress Truman weath er shoppi ng crime sports Core Satellites Metadata index
  • 8. Towards WDPLus 8Daniel Garijo and Pedro Szekely. WDPlus: Leveraging Wikidata to Link and Extend Tabular Data. AKTS 2019 (eScience 2019) 1953-01-20 Harry Truman Lamar President USA 1884-05-08 1972-12-26 male 1945-04-12 Bress Truman weath er shoppi ng crime sports Core Satellites Metadata index
  • 9. WDPlus framework: Metadata index and table Augmentation • Search • Keywords, variables or content • Wikifier may be used in search • Download • Download a dataset or its metadata • Augment • Merge your dataset with contents from other datasets automatically • Upload • Add new datasets (automated metadata profiling and provenance) • Enrich • Header enrichment for search efficiency 9Daniel Garijo and Pedro Szekely. WDPlus: Leveraging Wikidata to Link and Extend Tabular Data. AKTS 2019 (eScience 2019)
  • 10. WDPlus framework: T2WML 10Daniel Garijo and Pedro Szekely. WDPlus: Leveraging Wikidata to Link and Extend Tabular Data. AKTS 2019 (eScience 2019) Table overview Cell-based mapping. This mapping is saved in WDPlus for future reference Entity Linking Result sample Easy to share!
  • 11. Creating Wikidata Satellites: Challenges • Identify new properties to model satellites • Currently done by hand by Knowledge engineers • Creation of new Qnodes for satellite instances • Identified a schema for each satellite • Feedback loop to Wikidata • How to select a “trusty” statement when several values are available? • Namespace issues • Single namespace, or namespace per satellite? • Inter-satellite linkages 11Daniel Garijo and Pedro Szekely. WDPlus: Leveraging Wikidata to Link and Extend Tabular Data. AKTS 2019 (eScience 2019)
  • 12. Conclusions • Tabular data exists in heterogeneous formats • Difficult to find, use, augment and share • WDPlus is a framework to help discover, improve, search, augment, combine and share tabular data • WDPlus framework for profiling and enriching datasets • T2WML language to generate linked instances from tabular data • Encouraging early results on usability 12Daniel Garijo and Pedro Szekely. WDPlus: Leveraging Wikidata to Link and Extend Tabular Data. AKTS 2019 (eScience 2019)
  • 13. Help us extend WDPlus! Do you have comments, suggestions or use cases? Contact me at: Daniel Garijo and Pedro Szekely. WDPlus: Leveraging Wikidata to Link and Extend Tabular Data. AKTS 2019 (eScience 2019) 13 dgarijo@isi.edu