SlideShare a Scribd company logo
Gathering Alternative Surface Forms
for DBpedia Entities
Volha Bryl
University of Mannheim, Germany  Springer Nature
Christian Bizer, Heiko Paulheim
University of Mannheim, Germany
NLP & DBpedia @ ISWC 2015, Bethlehem, USA, October 11, 2015
Why you need Surface Forms
• Surface form (SF) of an entity is a collection of strings it can be
referred as to: synonyms, alternatives names, etc.
• Used to support many NLP tasks: co-reference resolution, entity
linking, disambiguation
2Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
Why you need Surface Forms
• Surface form (SF) of an entity is a collection of strings it can be
referred as to: synonyms, alternatives names, etc.
• Used to support many NLP tasks: co-reference resolution, entity
linking, disambiguation
“Billionaire Elon Musk has spelled out how he plans to
create temporary suns over Mars in order to heat the
Red Planet. Dismissing earlier comments that he
intended to nuke the planet’s surface, he says he wants
to create aerial explosions to heat it up. ”
--- to link the three entities, your machine should know that red planet is
an alternative name for Mars, and that Mars can be referred to just by its
“type” – planet
3Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
Surface Forms from Wiki(DB)pedia
• Some of Wikipedia’s (hence, DBpedia’s) crowd-sourced content look
quite like surface forms
• Page titles
• Redirects
• Account for alternative names, word forms (e.g. plurals), closely related words,
abbreviations, alternative spellings, likely misspellings, subtopics
• Disambiguation pages
• There are 10+ Bethlehem’s in US, according to
https://en.wikipedia.org/wiki/Bethlehem_(disambiguation)
• Anchor texts of links between wiki pages
Named after the Roman god of war, it is often referred to as the “Red
Planet”...
Source: Named after the [[Mars (mythology)|Roman god of war]], it is
often referred to as the "Red Planet“
• …additionally, we use anchor texts of links from external pages to Wikipedia
4Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
Surface Forms from Wiki(DB)pedia
• Not a new idea
• BabelNet, DBpedia Spotlight, … [see our paper for more links]
5Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
Mars in BabelNet:
Surface Forms from Wiki(DB)pedia
• Not a new idea
• BabelNet, DBpedia Spotlight, … [see our paper for more links]
• Problem: Quality
• …it is not only that quality is a problem, it is also that it have never been
assessed or addressed
• Reason 1: good quality of Wikipedia content is taken for granted
• Reason 2: hopes are that NLP algorithms won’t be influenced by noise
6Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
Mars in BabelNet:
Surface Forms from Wiki(DB)pedia
• Not a new idea
• BabelNet, DBpedia Spotlight, … [see our paper for more links]
• Problem: Quality – Why?
• By adding a redirect or an anchor text of internal Wikipedia link, a Wikipedia
editor might mean not only same as or also known as, but also related to,
contains, etc.
• Both variants serve the purpose of pointing to the correct wiki page
7Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
Mars in BabelNet:
Solution: Focus on Quality
• Step 1: Extract
• We extract SFs from Wikipedia labels, redirects, disambiguations, and anchor
texts of internal wiki-links
• Step 2: Evaluate
• We create a gold standard to evaluate the SFs quality
• Step 3: Filter
• We implement three filters to improve SFs quality
• Bonus: More SFs
• We extract SFs from anchor texts of Wiki links found in the Common Crawl
2014 corpus
• All datasets are available at
http://data.dws.informatik.uni-mannheim.de/dbpedia/nlp2014/
8Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
SFs Dataset Statistics
• LRD = Labels, Redirects, Disambiguations
• Extracted from DBpedia dumps
• WAT = Wikipedia Anchor Texts
• Extracted by a new DBpedia extractor (based on PageLinksExtractor)
9Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
Gold Standard
• Manual annotation, 1 annotator, 2 subsets
• Popular subset: manually selected 34 popular entities of different types
• Denmark, Berlin, Apple Inc., Animal Farm, Michael Jackson, Star Wars, Diego
Maradona, Mars, etc.
• ~82 SFs per entity, linked from other Wiki pages 813,736 times
• Random subset: randomly selected 81 entities each having at least 5 SFs
• Andy_Zaltzman, Bell AH-1 SuperCobra, Biarritz, Castellum, Firefox (film), Kipchak
languages, ParisTech, Psychokinesis, etc.
• ~13 SFs per entity , linked from other Wiki pages 14,760 times
Available at http://data.dws.informatik.uni-mannheim.de/dbpedia/nlp2014/gold/
10Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
Gold Standard
• Type of annotations
• correct (“the eternal city” for Rome),
• contained (“Google Japan” for Google), contains (“Turkey” for Istanbul),
• type of (“the city” for Rome)
• partial (“Diego” for Diego Maradona)
• related (“Google Blog” for Google)
• wrong (“during World War I” for United States)
11Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
Evaluation: How many correct SFs?
• SFs extracted from labels, redirects, disambiguations
• correct, popular subset: 66.8%
• correct, random subset: 86.6%
• SFs extracted from Wikipedia anchor texts
• correct, popular subset: 38.5%
• correct, random subset: 70.7%
• Combined dataset
• correct, popular subset: 45.7%
• correct, random subset: 75%
12Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
(1) Filtering: String Patterns
• Data analysis  there are patterns wrong SFs follow
• URLs: contain .com or .net (“Berlin-china.net” for Berlin)
• of-phrases, with the exceptions for city of, state of, and the like (“Issues of
Toronto” for Toronto)
• in-phrases (“Historical sites in Berlin” for Berlin)
• and-phrases (“Tom Cruise and Katie Holmes” for Tom Cruise)
• list-of (“List of Toronto MPs and MPPs” for Toronto)
• Increase in precision
• popular subset: 1.33%
• popular subset, LRD only: 3.75%
• random subset: less than 1%
13Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
(2) Filtering: Wikidata
• Observation: some SFs are entities on their own in other languages
• E.g. “Neckarau” city area of Mannheim redirects to Mannheim in English
Wikipedia, but has its own page in German Wikipedia
• Implementation: use DBpedia- Wikidata dumps, released in May 2015
• Check whether a SF exactly matches or is close (Levenshtein distance) to any
of the labels of Wikidata entities that do not have English but have other
Wikipedia pages
• Increase in precision
• 0.5% compared to pattern-based filtering
• 1.5% for SF extracted only from LRD
14Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
(3) Filtering: Frequency Scores
• For SFs extracted from anchor texts, frequencies are available
 TF-IDF scores
• Determining the threshold: 1.0 .. 8.0 values with a step of 0.2 evaluated
•Two thresholds selected, highest values of F1: 1.8 and 2.6
•Threshold 0 (no filtering) used as baseline
• Increase in precision
•20% for popular subset, 10% for random subset
* Filtering done on the dataset to which pattern- and Wikidata-based filters are already applied
15Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
SFs from Common Crawl
• Common Crawl (CC) is the largest publicly available web corpus
• Extraction done on Winter 2014 CC Corpus, in the context of the Web
Data Commons project
• http://webdatacommons.org/ -- extracting and providing for public download
various types of structured data from CC
• Data required a lot of cleaning
• 3M SFs added to our LRD&WAT corpus
• No annotated gold standard: left for future work
• Available at
http://data.dws.informatik.uni-mannheim.de/dbpedia/nlp2014/lrd-cc/
16Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
Conclusion and Future Work
• Main message
• quality of Wikipedia-base surface forms is often overlooked!
• Contributions
• Gold standard SFs, made available
• 3 filtering strategies: precision improved by > 20% for popular Wikipedia
entities, for > 10% for random entities
• Extracted SFs from Common Crawl corpus
• All data publicly available
• Future work directions
• Task-based evaluation of the resource, further work on the gold standard
17Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim

More Related Content

What's hot

Another RDF Encoding Form
Another RDF Encoding FormAnother RDF Encoding Form
Another RDF Encoding Form
Jakob .
 
Contexts and Importing in RDF
Contexts and Importing in RDFContexts and Importing in RDF
Contexts and Importing in RDF
Jie Bao
 
DBpedia Citation Challenge. (Not only) Polish Citations in Wikipedia: analysi...
DBpedia Citation Challenge. (Not only) Polish Citations in Wikipedia: analysi...DBpedia Citation Challenge. (Not only) Polish Citations in Wikipedia: analysi...
DBpedia Citation Challenge. (Not only) Polish Citations in Wikipedia: analysi...
Krzysztof Wecel
 
RDA: thinking globally, acting globally
RDA: thinking globally, acting globallyRDA: thinking globally, acting globally
RDA: thinking globally, acting globallyGordon Dunsire
 
OWL: Yet to arrive on the Web of Data?
OWL: Yet to arrive on the Web of Data?OWL: Yet to arrive on the Web of Data?
OWL: Yet to arrive on the Web of Data?Aidan Hogan
 
Changing Data: Implementing Primo for the Tri University Group of Libraries (...
Changing Data: Implementing Primo for the Tri University Group of Libraries (...Changing Data: Implementing Primo for the Tri University Group of Libraries (...
Changing Data: Implementing Primo for the Tri University Group of Libraries (...
Alison Hitchens
 
Memento 101
Memento 101Memento 101
Bio ontologies and semantic technologies
Bio ontologies and semantic technologiesBio ontologies and semantic technologies
Bio ontologies and semantic technologies
Prof. Wim Van Criekinge
 
FedX - Optimization Techniques for Federated Query Processing on Linked Data
FedX - Optimization Techniques for Federated Query Processing on Linked DataFedX - Optimization Techniques for Federated Query Processing on Linked Data
FedX - Optimization Techniques for Federated Query Processing on Linked Data
aschwarte
 
Hack U Barcelona 2011
Hack U Barcelona 2011Hack U Barcelona 2011
Hack U Barcelona 2011
Peter Mika
 
Lecture linked data cloud & sparql
Lecture linked data cloud & sparqlLecture linked data cloud & sparql
Lecture linked data cloud & sparqlDhavalkumar Thakker
 
Federated Query Formulation and Processing Through BioFed
Federated Query Formulation and Processing Through BioFedFederated Query Formulation and Processing Through BioFed
Federated Query Formulation and Processing Through BioFed
Muhammad Saleem
 
Federated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of DataFederated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of Data
Muhammad Saleem
 
Efficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federationEfficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federation
Muhammad Saleem
 
Aidan's PhD Viva
Aidan's PhD VivaAidan's PhD Viva
Aidan's PhD Viva
Aidan Hogan
 
Linking the Open Data? by Petko Valtchev
Linking the Open Data? by Petko ValtchevLinking the Open Data? by Petko Valtchev
Linking the Open Data? by Petko Valtchev
Trudat
 
Bringing It All Together: Mapping Continuing Resources Vocabularies for Linke...
Bringing It All Together: Mapping Continuing Resources Vocabularies for Linke...Bringing It All Together: Mapping Continuing Resources Vocabularies for Linke...
Bringing It All Together: Mapping Continuing Resources Vocabularies for Linke...
NASIG
 
Querying Linked Data on Android
Querying Linked Data on AndroidQuerying Linked Data on Android
Querying Linked Data on Android
EUCLID project
 
Federated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 TutorialFederated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 Tutorial
Muhammad Saleem
 
Information-rich programming in F# (ML Workshop 2012)
Information-rich programming in F# (ML Workshop 2012)Information-rich programming in F# (ML Workshop 2012)
Information-rich programming in F# (ML Workshop 2012)
Tomas Petricek
 

What's hot (20)

Another RDF Encoding Form
Another RDF Encoding FormAnother RDF Encoding Form
Another RDF Encoding Form
 
Contexts and Importing in RDF
Contexts and Importing in RDFContexts and Importing in RDF
Contexts and Importing in RDF
 
DBpedia Citation Challenge. (Not only) Polish Citations in Wikipedia: analysi...
DBpedia Citation Challenge. (Not only) Polish Citations in Wikipedia: analysi...DBpedia Citation Challenge. (Not only) Polish Citations in Wikipedia: analysi...
DBpedia Citation Challenge. (Not only) Polish Citations in Wikipedia: analysi...
 
RDA: thinking globally, acting globally
RDA: thinking globally, acting globallyRDA: thinking globally, acting globally
RDA: thinking globally, acting globally
 
OWL: Yet to arrive on the Web of Data?
OWL: Yet to arrive on the Web of Data?OWL: Yet to arrive on the Web of Data?
OWL: Yet to arrive on the Web of Data?
 
Changing Data: Implementing Primo for the Tri University Group of Libraries (...
Changing Data: Implementing Primo for the Tri University Group of Libraries (...Changing Data: Implementing Primo for the Tri University Group of Libraries (...
Changing Data: Implementing Primo for the Tri University Group of Libraries (...
 
Memento 101
Memento 101Memento 101
Memento 101
 
Bio ontologies and semantic technologies
Bio ontologies and semantic technologiesBio ontologies and semantic technologies
Bio ontologies and semantic technologies
 
FedX - Optimization Techniques for Federated Query Processing on Linked Data
FedX - Optimization Techniques for Federated Query Processing on Linked DataFedX - Optimization Techniques for Federated Query Processing on Linked Data
FedX - Optimization Techniques for Federated Query Processing on Linked Data
 
Hack U Barcelona 2011
Hack U Barcelona 2011Hack U Barcelona 2011
Hack U Barcelona 2011
 
Lecture linked data cloud & sparql
Lecture linked data cloud & sparqlLecture linked data cloud & sparql
Lecture linked data cloud & sparql
 
Federated Query Formulation and Processing Through BioFed
Federated Query Formulation and Processing Through BioFedFederated Query Formulation and Processing Through BioFed
Federated Query Formulation and Processing Through BioFed
 
Federated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of DataFederated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of Data
 
Efficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federationEfficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federation
 
Aidan's PhD Viva
Aidan's PhD VivaAidan's PhD Viva
Aidan's PhD Viva
 
Linking the Open Data? by Petko Valtchev
Linking the Open Data? by Petko ValtchevLinking the Open Data? by Petko Valtchev
Linking the Open Data? by Petko Valtchev
 
Bringing It All Together: Mapping Continuing Resources Vocabularies for Linke...
Bringing It All Together: Mapping Continuing Resources Vocabularies for Linke...Bringing It All Together: Mapping Continuing Resources Vocabularies for Linke...
Bringing It All Together: Mapping Continuing Resources Vocabularies for Linke...
 
Querying Linked Data on Android
Querying Linked Data on AndroidQuerying Linked Data on Android
Querying Linked Data on Android
 
Federated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 TutorialFederated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 Tutorial
 
Information-rich programming in F# (ML Workshop 2012)
Information-rich programming in F# (ML Workshop 2012)Information-rich programming in F# (ML Workshop 2012)
Information-rich programming in F# (ML Workshop 2012)
 

Viewers also liked

DBpedia InsideOut
DBpedia InsideOutDBpedia InsideOut
DBpedia InsideOut
Cristina Pattuelli
 
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
Stefan Dietze
 
Evaluating Named Entity Recognition and Disambiguation in News and Tweets
Evaluating Named Entity Recognition and Disambiguation in News and TweetsEvaluating Named Entity Recognition and Disambiguation in News and Tweets
Evaluating Named Entity Recognition and Disambiguation in News and Tweets
Marieke van Erp
 
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataIntroduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Sören Auer
 
DBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of DataDBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of Data
Sebastian Hellmann
 
LDQL: A Query Language for the Web of Linked Data
LDQL: A Query Language for the Web of Linked DataLDQL: A Query Language for the Web of Linked Data
LDQL: A Query Language for the Web of Linked Data
Olaf Hartig
 
Fast Approximate A-box Consistency Checking using Machine Learning
Fast Approximate  A-box Consistency Checking using Machine LearningFast Approximate  A-box Consistency Checking using Machine Learning
Fast Approximate A-box Consistency Checking using Machine Learning
Heiko Paulheim
 
Applying Linked Open Data to Public Procurement
Applying Linked Open Data to Public ProcurementApplying Linked Open Data to Public Procurement
Applying Linked Open Data to Public Procurement
Jindřich Mynarz
 
Exploiting the query structure for efficient join ordering in SPARQL queries
Exploiting the query structure for efficient join ordering in SPARQL queriesExploiting the query structure for efficient join ordering in SPARQL queries
Exploiting the query structure for efficient join ordering in SPARQL queries
Luiz Henrique Zambom Santana
 
Data Mining with Background Knowledge from the Web - Introducing the RapidMin...
Data Mining with Background Knowledge from the Web - Introducing the RapidMin...Data Mining with Background Knowledge from the Web - Introducing the RapidMin...
Data Mining with Background Knowledge from the Web - Introducing the RapidMin...
Heiko Paulheim
 
A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud
A Provenance assisted Roadmap for Life Sciences Linked Open Data CloudA Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud
A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud
Syed Muhammad Ali Hasnain
 
Unsupervised Extraction of Attributes and Their Values from Product Description
Unsupervised Extraction of Attributes and Their Values from Product DescriptionUnsupervised Extraction of Attributes and Their Values from Product Description
Unsupervised Extraction of Attributes and Their Values from Product Description
Rakuten Group, Inc.
 
FedViz: A Visual Interface for SPARQL Queries Formulation and Execution
FedViz: A Visual Interface for SPARQL Queries Formulation and ExecutionFedViz: A Visual Interface for SPARQL Queries Formulation and Execution
FedViz: A Visual Interface for SPARQL Queries Formulation and Execution
Syed Muhammad Ali Hasnain
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Olaf Hartig
 
RDF Tutorial - SPARQL 20091031
RDF Tutorial - SPARQL 20091031RDF Tutorial - SPARQL 20091031
RDF Tutorial - SPARQL 20091031
kwangsub kim
 
Querying Linked Data with SPARQL
Querying Linked Data with SPARQLQuerying Linked Data with SPARQL
Querying Linked Data with SPARQL
Olaf Hartig
 
The Future is Federated
The Future is FederatedThe Future is Federated
The Future is Federated
Ruben Verborgh
 
Julien Gonçalves: Named entity recognition and disambiguation using an iterat...
Julien Gonçalves: Named entity recognition and disambiguation using an iterat...Julien Gonçalves: Named entity recognition and disambiguation using an iterat...
Julien Gonçalves: Named entity recognition and disambiguation using an iterat...
Semantic Web Company
 

Viewers also liked (20)

DBpedia InsideOut
DBpedia InsideOutDBpedia InsideOut
DBpedia InsideOut
 
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
 
Evaluating Named Entity Recognition and Disambiguation in News and Tweets
Evaluating Named Entity Recognition and Disambiguation in News and TweetsEvaluating Named Entity Recognition and Disambiguation in News and Tweets
Evaluating Named Entity Recognition and Disambiguation in News and Tweets
 
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataIntroduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
 
Linked Data Fragments
Linked Data FragmentsLinked Data Fragments
Linked Data Fragments
 
NLP todo
NLP todoNLP todo
NLP todo
 
DBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of DataDBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of Data
 
LDQL: A Query Language for the Web of Linked Data
LDQL: A Query Language for the Web of Linked DataLDQL: A Query Language for the Web of Linked Data
LDQL: A Query Language for the Web of Linked Data
 
Fast Approximate A-box Consistency Checking using Machine Learning
Fast Approximate  A-box Consistency Checking using Machine LearningFast Approximate  A-box Consistency Checking using Machine Learning
Fast Approximate A-box Consistency Checking using Machine Learning
 
Applying Linked Open Data to Public Procurement
Applying Linked Open Data to Public ProcurementApplying Linked Open Data to Public Procurement
Applying Linked Open Data to Public Procurement
 
Exploiting the query structure for efficient join ordering in SPARQL queries
Exploiting the query structure for efficient join ordering in SPARQL queriesExploiting the query structure for efficient join ordering in SPARQL queries
Exploiting the query structure for efficient join ordering in SPARQL queries
 
Data Mining with Background Knowledge from the Web - Introducing the RapidMin...
Data Mining with Background Knowledge from the Web - Introducing the RapidMin...Data Mining with Background Knowledge from the Web - Introducing the RapidMin...
Data Mining with Background Knowledge from the Web - Introducing the RapidMin...
 
A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud
A Provenance assisted Roadmap for Life Sciences Linked Open Data CloudA Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud
A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud
 
Unsupervised Extraction of Attributes and Their Values from Product Description
Unsupervised Extraction of Attributes and Their Values from Product DescriptionUnsupervised Extraction of Attributes and Their Values from Product Description
Unsupervised Extraction of Attributes and Their Values from Product Description
 
FedViz: A Visual Interface for SPARQL Queries Formulation and Execution
FedViz: A Visual Interface for SPARQL Queries Formulation and ExecutionFedViz: A Visual Interface for SPARQL Queries Formulation and Execution
FedViz: A Visual Interface for SPARQL Queries Formulation and Execution
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
 
RDF Tutorial - SPARQL 20091031
RDF Tutorial - SPARQL 20091031RDF Tutorial - SPARQL 20091031
RDF Tutorial - SPARQL 20091031
 
Querying Linked Data with SPARQL
Querying Linked Data with SPARQLQuerying Linked Data with SPARQL
Querying Linked Data with SPARQL
 
The Future is Federated
The Future is FederatedThe Future is Federated
The Future is Federated
 
Julien Gonçalves: Named entity recognition and disambiguation using an iterat...
Julien Gonçalves: Named entity recognition and disambiguation using an iterat...Julien Gonçalves: Named entity recognition and disambiguation using an iterat...
Julien Gonçalves: Named entity recognition and disambiguation using an iterat...
 

Similar to Gathering Alternative Surface Forms for DBpedia Entities

DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, BerlinDBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, BerlinAnja Jentzsch
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Anja Jentzsch
 
Annotating Scholarly Resources
Annotating Scholarly ResourcesAnnotating Scholarly Resources
Annotating Scholarly ResourcesRobert Sanderson
 
NLP & DBpedia
 NLP & DBpedia NLP & DBpedia
NLP & DBpedia
kelbedweihy
 
New Directions in Information Organization: A Linked Data Model with BIBFRAME
New Directions in Information Organization: A Linked Data Model with BIBFRAMENew Directions in Information Organization: A Linked Data Model with BIBFRAME
New Directions in Information Organization: A Linked Data Model with BIBFRAME
SharonYang
 
Schema.org - An Extending Influence
Schema.org - An Extending InfluenceSchema.org - An Extending Influence
Schema.org - An Extending Influence
Richard Wallis
 
Intro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & MuseumsIntro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & Museums
Jon Voss
 
WikiAsp: A Dataset for Multi-domain Aspect-based Summarization
WikiAsp: A Dataset for Multi-domain Aspect-based SummarizationWikiAsp: A Dataset for Multi-domain Aspect-based Summarization
WikiAsp: A Dataset for Multi-domain Aspect-based Summarization
Hiroaki Hayashi
 
SDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsSDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsKorea Sdec
 
Learning Conflict Resolution Strategies for Cross-Language Wikipedia Data Fusion
Learning Conflict Resolution Strategies for Cross-Language Wikipedia Data FusionLearning Conflict Resolution Strategies for Cross-Language Wikipedia Data Fusion
Learning Conflict Resolution Strategies for Cross-Language Wikipedia Data Fusion
Volha Bryl
 
Type Inference on Noisy RDF Data
Type Inference on Noisy RDF DataType Inference on Noisy RDF Data
Type Inference on Noisy RDF Data
Heiko Paulheim
 
Thesis Proposal: User Application Profiles for Publishing Linked Data in HTM...
Thesis Proposal: User Application Profiles for Publishing Linked Data in  HTM...Thesis Proposal: User Application Profiles for Publishing Linked Data in  HTM...
Thesis Proposal: User Application Profiles for Publishing Linked Data in HTM...
Sean Petiya
 
Schema.org - Extending Benefits
Schema.org - Extending BenefitsSchema.org - Extending Benefits
Schema.org - Extending Benefits
Richard Wallis
 
Machine-In-The-Loop for Knowledge Discovery
Machine-In-The-Loop for Knowledge DiscoveryMachine-In-The-Loop for Knowledge Discovery
Machine-In-The-Loop for Knowledge Discovery
odsc
 
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
National Information Standards Organization (NISO)
 
Linked Data Basics
Linked Data BasicsLinked Data Basics
Linked Data Basics
Anja Jentzsch
 
Importing life science at a into Neo4j
Importing life science at a into Neo4jImporting life science at a into Neo4j
Importing life science at a into Neo4j
Simon Jupp
 
ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+ ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+
Prateek Jain
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedSören Auer
 

Similar to Gathering Alternative Surface Forms for DBpedia Entities (20)

DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, BerlinDBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)
 
Annotating Scholarly Resources
Annotating Scholarly ResourcesAnnotating Scholarly Resources
Annotating Scholarly Resources
 
NLP & DBpedia
 NLP & DBpedia NLP & DBpedia
NLP & DBpedia
 
New Directions in Information Organization: A Linked Data Model with BIBFRAME
New Directions in Information Organization: A Linked Data Model with BIBFRAMENew Directions in Information Organization: A Linked Data Model with BIBFRAME
New Directions in Information Organization: A Linked Data Model with BIBFRAME
 
Schema.org - An Extending Influence
Schema.org - An Extending InfluenceSchema.org - An Extending Influence
Schema.org - An Extending Influence
 
Intro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & MuseumsIntro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & Museums
 
WikiAsp: A Dataset for Multi-domain Aspect-based Summarization
WikiAsp: A Dataset for Multi-domain Aspect-based SummarizationWikiAsp: A Dataset for Multi-domain Aspect-based Summarization
WikiAsp: A Dataset for Multi-domain Aspect-based Summarization
 
SDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsSDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and models
 
Learning Conflict Resolution Strategies for Cross-Language Wikipedia Data Fusion
Learning Conflict Resolution Strategies for Cross-Language Wikipedia Data FusionLearning Conflict Resolution Strategies for Cross-Language Wikipedia Data Fusion
Learning Conflict Resolution Strategies for Cross-Language Wikipedia Data Fusion
 
Type Inference on Noisy RDF Data
Type Inference on Noisy RDF DataType Inference on Noisy RDF Data
Type Inference on Noisy RDF Data
 
Thesis Proposal: User Application Profiles for Publishing Linked Data in HTM...
Thesis Proposal: User Application Profiles for Publishing Linked Data in  HTM...Thesis Proposal: User Application Profiles for Publishing Linked Data in  HTM...
Thesis Proposal: User Application Profiles for Publishing Linked Data in HTM...
 
Schema.org - Extending Benefits
Schema.org - Extending BenefitsSchema.org - Extending Benefits
Schema.org - Extending Benefits
 
Machine-In-The-Loop for Knowledge Discovery
Machine-In-The-Loop for Knowledge DiscoveryMachine-In-The-Loop for Knowledge Discovery
Machine-In-The-Loop for Knowledge Discovery
 
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
 
Linked Data Basics
Linked Data BasicsLinked Data Basics
Linked Data Basics
 
Importing life science at a into Neo4j
Importing life science at a into Neo4jImporting life science at a into Neo4j
Importing life science at a into Neo4j
 
ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+ ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge stripped
 
NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...
NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...
NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...
 

More from Heiko Paulheim

Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...
Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
Heiko Paulheim
 
What_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfWhat_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdf
Heiko Paulheim
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
Heiko Paulheim
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
Heiko Paulheim
 
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsKnowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Heiko Paulheim
 
From Wikis to Knowledge Graphs
From Wikis to Knowledge GraphsFrom Wikis to Knowledge Graphs
From Wikis to Knowledge Graphs
Heiko Paulheim
 
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Heiko Paulheim
 
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids  on the Knowledge Graph BlockBeyond DBpedia and YAGO – The New Kids  on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Heiko Paulheim
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Heiko Paulheim
 
Machine Learning & Embeddings for Large Knowledge Graphs
Machine Learning & Embeddings  for Large Knowledge GraphsMachine Learning & Embeddings  for Large Knowledge Graphs
Machine Learning & Embeddings for Large Knowledge Graphs
Heiko Paulheim
 
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge GraphFrom Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
Heiko Paulheim
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Heiko Paulheim
 
Make Embeddings Semantic Again!
Make Embeddings Semantic Again!Make Embeddings Semantic Again!
Make Embeddings Semantic Again!
Heiko Paulheim
 
How much is a Triple?
How much is a Triple?How much is a Triple?
How much is a Triple?
Heiko Paulheim
 
Machine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge GraphsMachine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge Graphs
Heiko Paulheim
 
Weakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on TwitterWeakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on Twitter
Heiko Paulheim
 
Towards Knowledge Graph Profiling
Towards Knowledge Graph ProfilingTowards Knowledge Graph Profiling
Towards Knowledge Graph Profiling
Heiko Paulheim
 
Knowledge Graphs on the Web
Knowledge Graphs on the WebKnowledge Graphs on the Web
Knowledge Graphs on the Web
Heiko Paulheim
 
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and OntologyData-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
Heiko Paulheim
 
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on Top
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on TopServing DBpedia with DOLCE - More Than Just Adding a Cherry on Top
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on Top
Heiko Paulheim
 

More from Heiko Paulheim (20)

Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...
Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
 
What_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfWhat_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdf
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
 
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsKnowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
 
From Wikis to Knowledge Graphs
From Wikis to Knowledge GraphsFrom Wikis to Knowledge Graphs
From Wikis to Knowledge Graphs
 
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
 
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids  on the Knowledge Graph BlockBeyond DBpedia and YAGO – The New Kids  on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
 
Machine Learning & Embeddings for Large Knowledge Graphs
Machine Learning & Embeddings  for Large Knowledge GraphsMachine Learning & Embeddings  for Large Knowledge Graphs
Machine Learning & Embeddings for Large Knowledge Graphs
 
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge GraphFrom Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
 
Make Embeddings Semantic Again!
Make Embeddings Semantic Again!Make Embeddings Semantic Again!
Make Embeddings Semantic Again!
 
How much is a Triple?
How much is a Triple?How much is a Triple?
How much is a Triple?
 
Machine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge GraphsMachine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge Graphs
 
Weakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on TwitterWeakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on Twitter
 
Towards Knowledge Graph Profiling
Towards Knowledge Graph ProfilingTowards Knowledge Graph Profiling
Towards Knowledge Graph Profiling
 
Knowledge Graphs on the Web
Knowledge Graphs on the WebKnowledge Graphs on the Web
Knowledge Graphs on the Web
 
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and OntologyData-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
 
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on Top
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on TopServing DBpedia with DOLCE - More Than Just Adding a Cherry on Top
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on Top
 

Recently uploaded

1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
James Polillo
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
alex933524
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
AlejandraGmez176757
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 

Recently uploaded (20)

1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 

Gathering Alternative Surface Forms for DBpedia Entities

  • 1. Gathering Alternative Surface Forms for DBpedia Entities Volha Bryl University of Mannheim, Germany  Springer Nature Christian Bizer, Heiko Paulheim University of Mannheim, Germany NLP & DBpedia @ ISWC 2015, Bethlehem, USA, October 11, 2015
  • 2. Why you need Surface Forms • Surface form (SF) of an entity is a collection of strings it can be referred as to: synonyms, alternatives names, etc. • Used to support many NLP tasks: co-reference resolution, entity linking, disambiguation 2Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
  • 3. Why you need Surface Forms • Surface form (SF) of an entity is a collection of strings it can be referred as to: synonyms, alternatives names, etc. • Used to support many NLP tasks: co-reference resolution, entity linking, disambiguation “Billionaire Elon Musk has spelled out how he plans to create temporary suns over Mars in order to heat the Red Planet. Dismissing earlier comments that he intended to nuke the planet’s surface, he says he wants to create aerial explosions to heat it up. ” --- to link the three entities, your machine should know that red planet is an alternative name for Mars, and that Mars can be referred to just by its “type” – planet 3Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
  • 4. Surface Forms from Wiki(DB)pedia • Some of Wikipedia’s (hence, DBpedia’s) crowd-sourced content look quite like surface forms • Page titles • Redirects • Account for alternative names, word forms (e.g. plurals), closely related words, abbreviations, alternative spellings, likely misspellings, subtopics • Disambiguation pages • There are 10+ Bethlehem’s in US, according to https://en.wikipedia.org/wiki/Bethlehem_(disambiguation) • Anchor texts of links between wiki pages Named after the Roman god of war, it is often referred to as the “Red Planet”... Source: Named after the [[Mars (mythology)|Roman god of war]], it is often referred to as the "Red Planet“ • …additionally, we use anchor texts of links from external pages to Wikipedia 4Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
  • 5. Surface Forms from Wiki(DB)pedia • Not a new idea • BabelNet, DBpedia Spotlight, … [see our paper for more links] 5Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim Mars in BabelNet:
  • 6. Surface Forms from Wiki(DB)pedia • Not a new idea • BabelNet, DBpedia Spotlight, … [see our paper for more links] • Problem: Quality • …it is not only that quality is a problem, it is also that it have never been assessed or addressed • Reason 1: good quality of Wikipedia content is taken for granted • Reason 2: hopes are that NLP algorithms won’t be influenced by noise 6Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim Mars in BabelNet:
  • 7. Surface Forms from Wiki(DB)pedia • Not a new idea • BabelNet, DBpedia Spotlight, … [see our paper for more links] • Problem: Quality – Why? • By adding a redirect or an anchor text of internal Wikipedia link, a Wikipedia editor might mean not only same as or also known as, but also related to, contains, etc. • Both variants serve the purpose of pointing to the correct wiki page 7Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim Mars in BabelNet:
  • 8. Solution: Focus on Quality • Step 1: Extract • We extract SFs from Wikipedia labels, redirects, disambiguations, and anchor texts of internal wiki-links • Step 2: Evaluate • We create a gold standard to evaluate the SFs quality • Step 3: Filter • We implement three filters to improve SFs quality • Bonus: More SFs • We extract SFs from anchor texts of Wiki links found in the Common Crawl 2014 corpus • All datasets are available at http://data.dws.informatik.uni-mannheim.de/dbpedia/nlp2014/ 8Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
  • 9. SFs Dataset Statistics • LRD = Labels, Redirects, Disambiguations • Extracted from DBpedia dumps • WAT = Wikipedia Anchor Texts • Extracted by a new DBpedia extractor (based on PageLinksExtractor) 9Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
  • 10. Gold Standard • Manual annotation, 1 annotator, 2 subsets • Popular subset: manually selected 34 popular entities of different types • Denmark, Berlin, Apple Inc., Animal Farm, Michael Jackson, Star Wars, Diego Maradona, Mars, etc. • ~82 SFs per entity, linked from other Wiki pages 813,736 times • Random subset: randomly selected 81 entities each having at least 5 SFs • Andy_Zaltzman, Bell AH-1 SuperCobra, Biarritz, Castellum, Firefox (film), Kipchak languages, ParisTech, Psychokinesis, etc. • ~13 SFs per entity , linked from other Wiki pages 14,760 times Available at http://data.dws.informatik.uni-mannheim.de/dbpedia/nlp2014/gold/ 10Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
  • 11. Gold Standard • Type of annotations • correct (“the eternal city” for Rome), • contained (“Google Japan” for Google), contains (“Turkey” for Istanbul), • type of (“the city” for Rome) • partial (“Diego” for Diego Maradona) • related (“Google Blog” for Google) • wrong (“during World War I” for United States) 11Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
  • 12. Evaluation: How many correct SFs? • SFs extracted from labels, redirects, disambiguations • correct, popular subset: 66.8% • correct, random subset: 86.6% • SFs extracted from Wikipedia anchor texts • correct, popular subset: 38.5% • correct, random subset: 70.7% • Combined dataset • correct, popular subset: 45.7% • correct, random subset: 75% 12Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
  • 13. (1) Filtering: String Patterns • Data analysis  there are patterns wrong SFs follow • URLs: contain .com or .net (“Berlin-china.net” for Berlin) • of-phrases, with the exceptions for city of, state of, and the like (“Issues of Toronto” for Toronto) • in-phrases (“Historical sites in Berlin” for Berlin) • and-phrases (“Tom Cruise and Katie Holmes” for Tom Cruise) • list-of (“List of Toronto MPs and MPPs” for Toronto) • Increase in precision • popular subset: 1.33% • popular subset, LRD only: 3.75% • random subset: less than 1% 13Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
  • 14. (2) Filtering: Wikidata • Observation: some SFs are entities on their own in other languages • E.g. “Neckarau” city area of Mannheim redirects to Mannheim in English Wikipedia, but has its own page in German Wikipedia • Implementation: use DBpedia- Wikidata dumps, released in May 2015 • Check whether a SF exactly matches or is close (Levenshtein distance) to any of the labels of Wikidata entities that do not have English but have other Wikipedia pages • Increase in precision • 0.5% compared to pattern-based filtering • 1.5% for SF extracted only from LRD 14Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
  • 15. (3) Filtering: Frequency Scores • For SFs extracted from anchor texts, frequencies are available  TF-IDF scores • Determining the threshold: 1.0 .. 8.0 values with a step of 0.2 evaluated •Two thresholds selected, highest values of F1: 1.8 and 2.6 •Threshold 0 (no filtering) used as baseline • Increase in precision •20% for popular subset, 10% for random subset * Filtering done on the dataset to which pattern- and Wikidata-based filters are already applied 15Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
  • 16. SFs from Common Crawl • Common Crawl (CC) is the largest publicly available web corpus • Extraction done on Winter 2014 CC Corpus, in the context of the Web Data Commons project • http://webdatacommons.org/ -- extracting and providing for public download various types of structured data from CC • Data required a lot of cleaning • 3M SFs added to our LRD&WAT corpus • No annotated gold standard: left for future work • Available at http://data.dws.informatik.uni-mannheim.de/dbpedia/nlp2014/lrd-cc/ 16Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim
  • 17. Conclusion and Future Work • Main message • quality of Wikipedia-base surface forms is often overlooked! • Contributions • Gold standard SFs, made available • 3 filtering strategies: precision improved by > 20% for popular Wikipedia entities, for > 10% for random entities • Extracted SFs from Common Crawl corpus • All data publicly available • Future work directions • Task-based evaluation of the resource, further work on the gold standard 17Surface Forms for DBpedia Entities, Bryl, Bizer, Paulheim