SlideShare a Scribd company logo
1 of 15
Download to read offline
SwissLink
High-Precision, Context-Free Entity Linking
Exploiting Unambiguous Labels
Roman Prokofyev, Michael Luggen, Djellel Eddine Difallah, Philippe Cudré-Mauroux
eXascale Infolab, University of Fribourg, Switzerland
Entity Linking
“In natural language processing, entity linking, [...] is the task of determining
the identity of entities mentioned in text.
https://en.wikipedia.org/wiki/Entity_linking
Where the identity of an entity is commonly defined as an entry in a Knowledge
Base (KB).
It is usually solved in a multi-step process involving Named Entity Recognition
(NER) followed by a Candidate Selection and finally the Disambiguation.
2
Entity Linking
1. Named Entity Recognition (NER)
Distinguish between word of speech and defined concepts, also known as
named entities. Often involves a Part of Speech (POS) tagger.
2. Candidate Selection
Selecting possible candidates from the target Knowledge Base (where
entities are defined).
3. Disambiguation
Deciding which candidate is the correct identity corresponding to the
mention of a Named Entity. 3
Entity Linking
1. Named Entity Recognition (NER)
“It is a blast to visit Adam once more.”
2. Candidate Selection
Adam -> Adam (Name), Adam (City) in Oman, Amsterdam
3. Disambiguation
Adam -> https://en.wikipedia.org/wiki/Amsterdam
4
Motivation: High-precision context-free entity linking
● Certain applications require high-precision linked entities
○ Interactive applications where humans review results
○ Machine learning: training predictive
models may require high-precision
annotated text (no overfitting)
● Context-free
○ Works with any type of input:
text, tweets, search queries
○ But limited to unambiguous labels
The F1 score strikes a balance (harmonic mean) between precision and recall.
This is not necessarily the best optimization for the task at hand. 5
Precision
Recall
F1Score
Motivation: Categories of links to Wikipedia
What labels are used to link to entities (as Wikipedia pages) on the web?
Link by the most common label
web browser
Link by context
divided into three
subgroups: East,
West, and South
Link by reference
Wikipedia
Erroneous link
Oregon
Incorrectly linked entity even when
considering the context
<Web_browser>
381’623
times
<East_Slavic_languages>
<Angelina_Jolie>
16’333
times <University_of_Oregon>
6
Motivation: Prior probability scores
● Most important feature when not considering context
● Conditional probability P(link|label)
● Problems:
Does not necessarily capture ambiguity
Adam -> Adam (Name), Adam (City) in Oman, Amsterdam
Does not take categories into account
Wikipedia -> Angelina_Jolie [16’333]
7
Method (Problem)
Problem Formulation.
Given an arbitrary textual document ID
as input
Identify all named entities substrings {l1
, .., lk
}
And link them to their respective entities.
Effectively, our methods will return as output a set of label-entity pairs
OD
={(l1
,ez
),...,(lk
,ex
)}.
8
Method (Different Overall Approach)
Common
Named entity recognition -> candidate selection -> disambiguation
Context Free
Extract surface forms (KB or annotated corpus) -> clean and catalog -> fast
string matching
Surface form: a string representing an entity in a text.
Annotated corpus: e.g. Wikipedia articles, Common Crawl
9
Method (Catalog)
DBpedia
DBpedia labels can be considered as a catalog after the removal of ambiguous
labels. Downside: The labels in DBpedia are rather sparse.
Wikipedia
The internal links of Wikipedia are a good source of surface forms with links to
entities (Wikipedia pages). Downside: Noise is introduced due to the categories of
links.
10
Method
Ratio
Decide on which surface forms have ambiguous labels which can not be
considered without context.
Percentile method
Removes long tail and then readjusts weights to get better recall
11
Evaluation
Curated ground truth based on
Wikipedia articles allows us to
compare with manual annotations
in Wikipedia.
(30 randomly sampled articles)
● Ratio method: low recall
● Ratio+Percentile 99: best
12
Evaluation (Discussion)
● Increasing the ratio introduces more ambiguous labels -> direct impact on
precision
● The percentile method is balancing this effect by separating the ambiguity
from the popularity of the entities
● In general, we observe that the Percentile-Ratio method with 99-Percentile
and 10-Ratio strikes a good balance between high-precision results (>95%)
and reasonable recall (45%, 1309 entities)
13
High-Precision, Context-Free Entity Linking
Exploiting Unambiguous Labels
Links
Ground truth: https://github.com/eXascaleInfolab/Wikipedia30
Methods: https://github.com/eXascaleInfolab/kilogram
Evaluation: http://w3id.org/gerbil/experiment?id=201604300040
14
15

More Related Content

What's hot

香港六合彩 &raquo; SlideShare
香港六合彩 &raquo; SlideShare香港六合彩 &raquo; SlideShare
香港六合彩 &raquo; SlideShare
biyu
 

What's hot (20)

香港六合彩 &raquo; SlideShare
香港六合彩 &raquo; SlideShare香港六合彩 &raquo; SlideShare
香港六合彩 &raquo; SlideShare
 
Best C Sharp C# Training Online C# Online Course C# Online Training Best on...
Best C Sharp C# Training Online C# Online Course   C# Online Training Best on...Best C Sharp C# Training Online C# Online Course   C# Online Training Best on...
Best C Sharp C# Training Online C# Online Course C# Online Training Best on...
 
Testing in isolation
Testing in isolationTesting in isolation
Testing in isolation
 
C plusplus
C plusplusC plusplus
C plusplus
 
Object oriented programming concept
Object oriented programming conceptObject oriented programming concept
Object oriented programming concept
 
Object oriented programming C++
Object oriented programming C++Object oriented programming C++
Object oriented programming C++
 
General oops concepts
General oops conceptsGeneral oops concepts
General oops concepts
 
Pursuing Domain-Driven Design practices in PHP
Pursuing Domain-Driven Design practices in PHPPursuing Domain-Driven Design practices in PHP
Pursuing Domain-Driven Design practices in PHP
 
Introduction to Object Oriented Programming
Introduction to Object Oriented ProgrammingIntroduction to Object Oriented Programming
Introduction to Object Oriented Programming
 
Oop concepts classes_objects
Oop concepts classes_objectsOop concepts classes_objects
Oop concepts classes_objects
 
Object Oriented Programming Concepts
Object Oriented Programming ConceptsObject Oriented Programming Concepts
Object Oriented Programming Concepts
 
Object Oriented Concept
Object Oriented ConceptObject Oriented Concept
Object Oriented Concept
 
Std 12 computer chapter 6 object oriented concepts (part 1)
Std 12 computer chapter 6 object oriented concepts (part 1)Std 12 computer chapter 6 object oriented concepts (part 1)
Std 12 computer chapter 6 object oriented concepts (part 1)
 
Higher Order Applicative XML (Monterey 2002)
Higher Order Applicative XML (Monterey 2002)Higher Order Applicative XML (Monterey 2002)
Higher Order Applicative XML (Monterey 2002)
 
Object database standards, languages and design
Object database standards, languages and designObject database standards, languages and design
Object database standards, languages and design
 
Session 19 - Review Session
Session 19 - Review SessionSession 19 - Review Session
Session 19 - Review Session
 
General OOP concept [by-Digvijay]
General OOP concept [by-Digvijay]General OOP concept [by-Digvijay]
General OOP concept [by-Digvijay]
 
Object oriented programming
Object oriented programmingObject oriented programming
Object oriented programming
 
Inner Classes in Java
Inner Classes in JavaInner Classes in Java
Inner Classes in Java
 
Oop concept
Oop conceptOop concept
Oop concept
 

Similar to SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous Labels

Question Answering with Lydia
Question Answering with LydiaQuestion Answering with Lydia
Question Answering with Lydia
Jae Hong Kil
 
OOP programming for engineering students
OOP programming for engineering studentsOOP programming for engineering students
OOP programming for engineering students
iaeronlineexm
 
Introduction to Java Object Oiented Concepts and Basic terminologies
Introduction to Java Object Oiented Concepts and Basic terminologiesIntroduction to Java Object Oiented Concepts and Basic terminologies
Introduction to Java Object Oiented Concepts and Basic terminologies
TabassumMaktum
 

Similar to SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous Labels (20)

The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...
The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...
The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Ap...
 
Question Answering with Lydia
Question Answering with LydiaQuestion Answering with Lydia
Question Answering with Lydia
 
Chapter 1- Introduction.ppt
Chapter 1- Introduction.pptChapter 1- Introduction.ppt
Chapter 1- Introduction.ppt
 
Core java part1
Core java  part1Core java  part1
Core java part1
 
CPP_,module2_1.pptx
CPP_,module2_1.pptxCPP_,module2_1.pptx
CPP_,module2_1.pptx
 
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
 
OOP programming for engineering students
OOP programming for engineering studentsOOP programming for engineering students
OOP programming for engineering students
 
Java oo ps concepts
Java oo ps conceptsJava oo ps concepts
Java oo ps concepts
 
Answer ado.net pre-exam2018
Answer ado.net pre-exam2018Answer ado.net pre-exam2018
Answer ado.net pre-exam2018
 
Topic Extraction on Domain Ontology
Topic Extraction on Domain OntologyTopic Extraction on Domain Ontology
Topic Extraction on Domain Ontology
 
Oop java
Oop javaOop java
Oop java
 
Dom
DomDom
Dom
 
Introduction to Java Object Oiented Concepts and Basic terminologies
Introduction to Java Object Oiented Concepts and Basic terminologiesIntroduction to Java Object Oiented Concepts and Basic terminologies
Introduction to Java Object Oiented Concepts and Basic terminologies
 
Introduction to odbms
Introduction to odbmsIntroduction to odbms
Introduction to odbms
 
1 intro
1 intro1 intro
1 intro
 
Java Notes
Java NotesJava Notes
Java Notes
 
Code Search Based on Deep Neural Network and Code Mutation
Code Search Based on Deep Neural Network and Code MutationCode Search Based on Deep Neural Network and Code Mutation
Code Search Based on Deep Neural Network and Code Mutation
 
Unit 5.ppt
Unit 5.pptUnit 5.ppt
Unit 5.ppt
 
Java pdf
Java   pdfJava   pdf
Java pdf
 
Object oriented database concepts
Object oriented database conceptsObject oriented database concepts
Object oriented database concepts
 

More from eXascale Infolab

HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
eXascale Infolab
 
CIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingCIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition ranking
eXascale Infolab
 

More from eXascale Infolab (20)

Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link PredictionBeyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
 
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
 
Representation Learning on Complex Graphs
Representation Learning on Complex GraphsRepresentation Learning on Complex Graphs
Representation Learning on Complex Graphs
 
A force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory mapA force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory map
 
Cikm 2018
Cikm 2018Cikm 2018
Cikm 2018
 
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
 
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data OceansDependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
 
Crowd scheduling www2016
Crowd scheduling www2016Crowd scheduling www2016
Crowd scheduling www2016
 
SANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference ResolutionSANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference Resolution
 
Efficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataEfficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked Data
 
Entity-Centric Data Management
Entity-Centric Data ManagementEntity-Centric Data Management
Entity-Centric Data Management
 
SSSW 2015 Sense Making
SSSW 2015 Sense MakingSSSW 2015 Sense Making
SSSW 2015 Sense Making
 
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked DataLDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
 
Executing Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web DataExecuting Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web Data
 
The Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task CrowdsourcingThe Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task Crowdsourcing
 
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
 
CIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingCIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition ranking
 
OLTP-Bench
OLTP-BenchOLTP-Bench
OLTP-Bench
 
An Introduction to Big Data
An Introduction to Big DataAn Introduction to Big Data
An Introduction to Big Data
 
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
 

Recently uploaded

一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
pyhepag
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
zifhagzkk
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdf
great91
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
cyebo
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
ju0dztxtn
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Stephen266013
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
ppy8zfkfm
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
pyhepag
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
pwgnohujw
 

Recently uploaded (20)

一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
 
Digital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
Digital Marketing Demystified: Expert Tips from Samantha Rae CoolbethDigital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
Digital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
 
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
 
Bios of leading Astrologers & Researchers
Bios of leading Astrologers & ResearchersBios of leading Astrologers & Researchers
Bios of leading Astrologers & Researchers
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdf
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
 
What is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationWhat is Insertion Sort. Its basic information
What is Insertion Sort. Its basic information
 
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI  MANAJEMEN OF PENYAKIT TETANUS.pptMATERI  MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshare
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdf
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
 

SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous Labels

  • 1. SwissLink High-Precision, Context-Free Entity Linking Exploiting Unambiguous Labels Roman Prokofyev, Michael Luggen, Djellel Eddine Difallah, Philippe Cudré-Mauroux eXascale Infolab, University of Fribourg, Switzerland
  • 2. Entity Linking “In natural language processing, entity linking, [...] is the task of determining the identity of entities mentioned in text. https://en.wikipedia.org/wiki/Entity_linking Where the identity of an entity is commonly defined as an entry in a Knowledge Base (KB). It is usually solved in a multi-step process involving Named Entity Recognition (NER) followed by a Candidate Selection and finally the Disambiguation. 2
  • 3. Entity Linking 1. Named Entity Recognition (NER) Distinguish between word of speech and defined concepts, also known as named entities. Often involves a Part of Speech (POS) tagger. 2. Candidate Selection Selecting possible candidates from the target Knowledge Base (where entities are defined). 3. Disambiguation Deciding which candidate is the correct identity corresponding to the mention of a Named Entity. 3
  • 4. Entity Linking 1. Named Entity Recognition (NER) “It is a blast to visit Adam once more.” 2. Candidate Selection Adam -> Adam (Name), Adam (City) in Oman, Amsterdam 3. Disambiguation Adam -> https://en.wikipedia.org/wiki/Amsterdam 4
  • 5. Motivation: High-precision context-free entity linking ● Certain applications require high-precision linked entities ○ Interactive applications where humans review results ○ Machine learning: training predictive models may require high-precision annotated text (no overfitting) ● Context-free ○ Works with any type of input: text, tweets, search queries ○ But limited to unambiguous labels The F1 score strikes a balance (harmonic mean) between precision and recall. This is not necessarily the best optimization for the task at hand. 5 Precision Recall F1Score
  • 6. Motivation: Categories of links to Wikipedia What labels are used to link to entities (as Wikipedia pages) on the web? Link by the most common label web browser Link by context divided into three subgroups: East, West, and South Link by reference Wikipedia Erroneous link Oregon Incorrectly linked entity even when considering the context <Web_browser> 381’623 times <East_Slavic_languages> <Angelina_Jolie> 16’333 times <University_of_Oregon> 6
  • 7. Motivation: Prior probability scores ● Most important feature when not considering context ● Conditional probability P(link|label) ● Problems: Does not necessarily capture ambiguity Adam -> Adam (Name), Adam (City) in Oman, Amsterdam Does not take categories into account Wikipedia -> Angelina_Jolie [16’333] 7
  • 8. Method (Problem) Problem Formulation. Given an arbitrary textual document ID as input Identify all named entities substrings {l1 , .., lk } And link them to their respective entities. Effectively, our methods will return as output a set of label-entity pairs OD ={(l1 ,ez ),...,(lk ,ex )}. 8
  • 9. Method (Different Overall Approach) Common Named entity recognition -> candidate selection -> disambiguation Context Free Extract surface forms (KB or annotated corpus) -> clean and catalog -> fast string matching Surface form: a string representing an entity in a text. Annotated corpus: e.g. Wikipedia articles, Common Crawl 9
  • 10. Method (Catalog) DBpedia DBpedia labels can be considered as a catalog after the removal of ambiguous labels. Downside: The labels in DBpedia are rather sparse. Wikipedia The internal links of Wikipedia are a good source of surface forms with links to entities (Wikipedia pages). Downside: Noise is introduced due to the categories of links. 10
  • 11. Method Ratio Decide on which surface forms have ambiguous labels which can not be considered without context. Percentile method Removes long tail and then readjusts weights to get better recall 11
  • 12. Evaluation Curated ground truth based on Wikipedia articles allows us to compare with manual annotations in Wikipedia. (30 randomly sampled articles) ● Ratio method: low recall ● Ratio+Percentile 99: best 12
  • 13. Evaluation (Discussion) ● Increasing the ratio introduces more ambiguous labels -> direct impact on precision ● The percentile method is balancing this effect by separating the ambiguity from the popularity of the entities ● In general, we observe that the Percentile-Ratio method with 99-Percentile and 10-Ratio strikes a good balance between high-precision results (>95%) and reasonable recall (45%, 1309 entities) 13
  • 14. High-Precision, Context-Free Entity Linking Exploiting Unambiguous Labels Links Ground truth: https://github.com/eXascaleInfolab/Wikipedia30 Methods: https://github.com/eXascaleInfolab/kilogram Evaluation: http://w3id.org/gerbil/experiment?id=201604300040 14
  • 15. 15