SlideShare a Scribd company logo
1 of 18
Download to read offline
Language and Domain
Independent Entity Linking
with Quantified Collective
Validation
Han Wang, Jin Guang Zheng, Xiaogang Ma, Peter Fox, and Heng Ji
EMNLP2015
Presented by: Shuangshuang Zhou
Inui&Okazaki Lab. Tohoku University
# スライドの中の絵は著者の論⽂・ポストーから拝借
An example to explain the task
One day after released by the Patriots, Florida born Caldwell
visited the Jet. ...
The New York Jets have six receivers on the roster: Cotchery,
Coles, ...
New England Patriots
Reche, Caldwell Jerricho, Cotchery Laveranues, Coles
New York Jets
9/12/16 2
Motivation and Contribution
u “Most of the previous research extensively exploited the
linguistic features of the source documents in a supervised or
semi-supervised way”.
u Quantified Collective Validation can be applied to a new
language or domain:
u It can worked with limited linguistic resources.
u It can conduct more deliberate study on the KB.
u A collective way of aligning co-occurred mentions to the KB
with a further step to consider quantitatively differentiating
entity relations in the KB.
9/12/16 3
Approaches - Overview
Candidate Ranking
(Two ranking steps +
Quantified Collective
Validation)
Salience Ranking(SR) : measure
candidates’ importance without the
context using information entropy.
Context Similarity Ranking (CS) :
measures the structural similarity
between candidate
graphs using Jaccard Similarity.
Candidate Graph Collective Validation (CV)
9/12/16 4
Approaches – Salience Ranking
9/12/16 5
where R(c)is the relation set for c in the KB; H(r) is given by the below
equation; Et(r) is the tail entity set with c being the head entity and r being
the connecting relation in the KB; L(et) denotes the cardinality of the tail
entity set with et being the head entity in the KB. Sa(c) is recursively
computed until convergence.
Measure a candidate’s (entity) importance without the context using
information entropy,
(eh,r,et)
tuple format in KB
Approaches – Context Similarity Ranking
9/12/16 6
Measures the structural similarity between candidate
graphs using Jaccard Similarity.
Whether two co-occurring mentions have their entity
referents connected by some relation in the KB.
The more a Gi
c is structurally similar to its Gm, the
better the candidates in this Gi
c represent their
mentions in Gm.
Approaches – Mention Context Graph
9/12/16 7
Gm is a light-weight source context
representation which simply involves
mention co-occurrence.
• There will be an edge between two
mention vertices if both of them fall into
a context window in the source
document.
• Two mention vertices will be connected
via a dashed edge if they are
coreferential but are not located in the
same context window.One day after released by
the Patriots, Florida born
Caldwell visited the Jet.
...
The New York
Jets have six
receivers on the
roster: Cotchery,
Coles, ...
Approaches – KB Graph
9/12/16 8
GK is a weighted graph that consists of a set of vertices representing the
entities and a set of directed edges labeled with relations between entities.
A “wiki link” relation is added between two entities if one of them appears in the
Wikipedia article of the other.
Approaches – Candidate Graphs
9/12/16 9
Gc is a series of graphs each of which
represents a collective linking solution to
the given mentions.
• Two vertices are connected if they are
also connected in GK by some relation r
and their mentions are connected in Gm.
The edge label r is transferred from GK.
Approaches – Context Similarity Ranking
9/12/16 10
Measures the structural similarity between candidate
graphs using Jaccard Similarity.
Whether two co-occurring mentions have their entity
referents connected by some relation in the KB.
The more a Gi
c is structurally similar to its Gm, the
better the candidates in this Gi
c represent their
mentions in Gm.
Approaches – Candidate Graph Collective
Validation
9/12/16 11
• Assumption: a “tighter” relation between two candidates is more
likely to be an appropriate representation of the relation between
their co-occurring mentions in the source context.
• Quantitatively differentiates different types of relations using the
calculated relation weights in GK.
adding effects of “tighter” relations
salience ranking
context similarity ranking
Experiments - Generic English Corpora
9/12/16 12
Experiments on TAC-KBP 2013 linkable mentions
Baseline:
Compared with top3 supervised and top3 unsupervised
systems from TAC KBP 2013
Error Analysis:
1) context capturing is deficient
2) simple coreference rules
3) certain relations are missing in the KB.
(Zheng et ,al 2014)
Experiments - Generic English Corpora
9/12/16 13
• SR outperforms the best
KBP unsupervised system
(0.632).
• Although CS did not produce
a lot more correct linking
results than SR did, but it
promote a great number of
good candidates to the top
of the ranking list.
• CS is deficient in
recognizing the subtle
contextual difference
among similar candidates
(the same type).
Experiments - Generic Chinese Corpora
9/12/16 14
• Fahrnl et al.(2012) used
over 20 fine-tuned
features and many
linguistic resource.
• Error Analysis:
• A Low recall on
mapping candidates
between English and
Chinese
Experiments – Specific domain
9/12/16 15
• There is slight improvement in biomedical
science because candidates of the related
mentions mostly have similar relations in
the KB.
• First study on earth
science domain
• Errors Analysis:
• There are biased
effects caused by
salience ranking
when using generic
KB
• Some relations are
not clearly defined
in DBpedia.
Conclusion and Future work
u QCV has minimal reliance on linguistic analysis and
the deep utilization of structured KBs.
u The demonstrated a high-performance EL approach
that can be migrated to new languages and
domains.
u They plan to better extract mention context and
incorporate the impact of more distance KB entities
other than just the neighbors.
9/12/16 16
感想 (I)
u For conversational collective ranking of unsupervised EL
approaches, time complexity is a significant problem, the
upper bound of the computing time to link all mentions in a
documents is O(nm*nc*nnc*nnm).
u It is worth learning that they gave intensive analysis on
each experiment result.
u Since their method is less reliable on source document, it
could be also applied on short texts (Twitters, queries of
search engines).
9/12/16 17
感想 (II)
u Their approach may not effective when there is seldom co-
occurring mentions.
u We expect their system performance cross more generic
English corpus.
u Their method worked on linkable mentions, but their
method can not solve unlinkable mentions (NILs).
u For a new language and a new domain, they used the same
KB (DBpedia), and their system performance was effected
by the structured KB. So their performance need to be
verified with new KBs.
9/12/16 18

More Related Content

Similar to Language and Domain Independent Entity Linking with Quantified Collective Validation

TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxKalpit Desai
 
Distribution Similarity based Data Partition and Nearest Neighbor Search on U...
Distribution Similarity based Data Partition and Nearest Neighbor Search on U...Distribution Similarity based Data Partition and Nearest Neighbor Search on U...
Distribution Similarity based Data Partition and Nearest Neighbor Search on U...Editor IJMTER
 
Cmaps as intellectual prosthesis (GERAS 34, Paris)
Cmaps as intellectual prosthesis (GERAS 34, Paris)Cmaps as intellectual prosthesis (GERAS 34, Paris)
Cmaps as intellectual prosthesis (GERAS 34, Paris)Lawrie Hunter
 
What to read next? Challenges and Preliminary Results in Selecting Represen...
What to read next? Challenges and  Preliminary Results in Selecting  Represen...What to read next? Challenges and  Preliminary Results in Selecting  Represen...
What to read next? Challenges and Preliminary Results in Selecting Represen...MOVING Project
 
Metrics for Evaluating Quality of Embeddings for Ontological Concepts
Metrics for Evaluating Quality of Embeddings for Ontological Concepts Metrics for Evaluating Quality of Embeddings for Ontological Concepts
Metrics for Evaluating Quality of Embeddings for Ontological Concepts Saeedeh Shekarpour
 
Sending out an SOS (Summary of Summaries): A Brief Survey of Recent Work on A...
Sending out an SOS (Summary of Summaries): A Brief Survey of Recent Work on A...Sending out an SOS (Summary of Summaries): A Brief Survey of Recent Work on A...
Sending out an SOS (Summary of Summaries): A Brief Survey of Recent Work on A...Griffin Adams
 
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...Holistic Benchmarking of Big Linked Data
 
A Semantic Scoring Rubric For Concept Maps Design And Reliability
A Semantic Scoring Rubric For Concept Maps  Design And ReliabilityA Semantic Scoring Rubric For Concept Maps  Design And Reliability
A Semantic Scoring Rubric For Concept Maps Design And ReliabilityLiz Adams
 
See the trees: Concept mapping for text analysis
See the trees: Concept mapping for text analysisSee the trees: Concept mapping for text analysis
See the trees: Concept mapping for text analysisLawrie Hunter
 
Topic Extraction on Domain Ontology
Topic Extraction on Domain OntologyTopic Extraction on Domain Ontology
Topic Extraction on Domain OntologyKeerti Bhogaraju
 
5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information RetrievalBhaskar Mitra
 
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONIJDKP
 
Google Kernel Function
Google Kernel FunctionGoogle Kernel Function
Google Kernel FunctionBeibei Yang
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And ClusteringDataminingTools Inc
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clusteringguest0edcaf
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And ClusteringDatamining Tools
 
Text recycling research project
Text recycling research project Text recycling research project
Text recycling research project C0pe
 
Knowledge Graph Futures
Knowledge Graph FuturesKnowledge Graph Futures
Knowledge Graph FuturesPaul Groth
 

Similar to Language and Domain Independent Entity Linking with Quantified Collective Validation (20)

TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptx
 
Distribution Similarity based Data Partition and Nearest Neighbor Search on U...
Distribution Similarity based Data Partition and Nearest Neighbor Search on U...Distribution Similarity based Data Partition and Nearest Neighbor Search on U...
Distribution Similarity based Data Partition and Nearest Neighbor Search on U...
 
Cmaps as intellectual prosthesis (GERAS 34, Paris)
Cmaps as intellectual prosthesis (GERAS 34, Paris)Cmaps as intellectual prosthesis (GERAS 34, Paris)
Cmaps as intellectual prosthesis (GERAS 34, Paris)
 
What to read next? Challenges and Preliminary Results in Selecting Represen...
What to read next? Challenges and  Preliminary Results in Selecting  Represen...What to read next? Challenges and  Preliminary Results in Selecting  Represen...
What to read next? Challenges and Preliminary Results in Selecting Represen...
 
Metrics for Evaluating Quality of Embeddings for Ontological Concepts
Metrics for Evaluating Quality of Embeddings for Ontological Concepts Metrics for Evaluating Quality of Embeddings for Ontological Concepts
Metrics for Evaluating Quality of Embeddings for Ontological Concepts
 
Sending out an SOS (Summary of Summaries): A Brief Survey of Recent Work on A...
Sending out an SOS (Summary of Summaries): A Brief Survey of Recent Work on A...Sending out an SOS (Summary of Summaries): A Brief Survey of Recent Work on A...
Sending out an SOS (Summary of Summaries): A Brief Survey of Recent Work on A...
 
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
 
A Semantic Scoring Rubric For Concept Maps Design And Reliability
A Semantic Scoring Rubric For Concept Maps  Design And ReliabilityA Semantic Scoring Rubric For Concept Maps  Design And Reliability
A Semantic Scoring Rubric For Concept Maps Design And Reliability
 
EDI 2009- Advanced Search: What’s Under the Hood of your Favorite Search System?
EDI 2009- Advanced Search: What’s Under the Hood of your Favorite Search System?EDI 2009- Advanced Search: What’s Under the Hood of your Favorite Search System?
EDI 2009- Advanced Search: What’s Under the Hood of your Favorite Search System?
 
See the trees: Concept mapping for text analysis
See the trees: Concept mapping for text analysisSee the trees: Concept mapping for text analysis
See the trees: Concept mapping for text analysis
 
Topic Extraction on Domain Ontology
Topic Extraction on Domain OntologyTopic Extraction on Domain Ontology
Topic Extraction on Domain Ontology
 
5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval
 
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
 
Google Kernel Function
Google Kernel FunctionGoogle Kernel Function
Google Kernel Function
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clustering
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clustering
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clustering
 
Text recycling research project
Text recycling research project Text recycling research project
Text recycling research project
 
Knowledge Graph Futures
Knowledge Graph FuturesKnowledge Graph Futures
Knowledge Graph Futures
 
STI Summit 2011 - DB vs RDF
STI Summit 2011 - DB vs RDFSTI Summit 2011 - DB vs RDF
STI Summit 2011 - DB vs RDF
 

Recently uploaded

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

Language and Domain Independent Entity Linking with Quantified Collective Validation

  • 1. Language and Domain Independent Entity Linking with Quantified Collective Validation Han Wang, Jin Guang Zheng, Xiaogang Ma, Peter Fox, and Heng Ji EMNLP2015 Presented by: Shuangshuang Zhou Inui&Okazaki Lab. Tohoku University # スライドの中の絵は著者の論⽂・ポストーから拝借
  • 2. An example to explain the task One day after released by the Patriots, Florida born Caldwell visited the Jet. ... The New York Jets have six receivers on the roster: Cotchery, Coles, ... New England Patriots Reche, Caldwell Jerricho, Cotchery Laveranues, Coles New York Jets 9/12/16 2
  • 3. Motivation and Contribution u “Most of the previous research extensively exploited the linguistic features of the source documents in a supervised or semi-supervised way”. u Quantified Collective Validation can be applied to a new language or domain: u It can worked with limited linguistic resources. u It can conduct more deliberate study on the KB. u A collective way of aligning co-occurred mentions to the KB with a further step to consider quantitatively differentiating entity relations in the KB. 9/12/16 3
  • 4. Approaches - Overview Candidate Ranking (Two ranking steps + Quantified Collective Validation) Salience Ranking(SR) : measure candidates’ importance without the context using information entropy. Context Similarity Ranking (CS) : measures the structural similarity between candidate graphs using Jaccard Similarity. Candidate Graph Collective Validation (CV) 9/12/16 4
  • 5. Approaches – Salience Ranking 9/12/16 5 where R(c)is the relation set for c in the KB; H(r) is given by the below equation; Et(r) is the tail entity set with c being the head entity and r being the connecting relation in the KB; L(et) denotes the cardinality of the tail entity set with et being the head entity in the KB. Sa(c) is recursively computed until convergence. Measure a candidate’s (entity) importance without the context using information entropy, (eh,r,et) tuple format in KB
  • 6. Approaches – Context Similarity Ranking 9/12/16 6 Measures the structural similarity between candidate graphs using Jaccard Similarity. Whether two co-occurring mentions have their entity referents connected by some relation in the KB. The more a Gi c is structurally similar to its Gm, the better the candidates in this Gi c represent their mentions in Gm.
  • 7. Approaches – Mention Context Graph 9/12/16 7 Gm is a light-weight source context representation which simply involves mention co-occurrence. • There will be an edge between two mention vertices if both of them fall into a context window in the source document. • Two mention vertices will be connected via a dashed edge if they are coreferential but are not located in the same context window.One day after released by the Patriots, Florida born Caldwell visited the Jet. ... The New York Jets have six receivers on the roster: Cotchery, Coles, ...
  • 8. Approaches – KB Graph 9/12/16 8 GK is a weighted graph that consists of a set of vertices representing the entities and a set of directed edges labeled with relations between entities. A “wiki link” relation is added between two entities if one of them appears in the Wikipedia article of the other.
  • 9. Approaches – Candidate Graphs 9/12/16 9 Gc is a series of graphs each of which represents a collective linking solution to the given mentions. • Two vertices are connected if they are also connected in GK by some relation r and their mentions are connected in Gm. The edge label r is transferred from GK.
  • 10. Approaches – Context Similarity Ranking 9/12/16 10 Measures the structural similarity between candidate graphs using Jaccard Similarity. Whether two co-occurring mentions have their entity referents connected by some relation in the KB. The more a Gi c is structurally similar to its Gm, the better the candidates in this Gi c represent their mentions in Gm.
  • 11. Approaches – Candidate Graph Collective Validation 9/12/16 11 • Assumption: a “tighter” relation between two candidates is more likely to be an appropriate representation of the relation between their co-occurring mentions in the source context. • Quantitatively differentiates different types of relations using the calculated relation weights in GK. adding effects of “tighter” relations salience ranking context similarity ranking
  • 12. Experiments - Generic English Corpora 9/12/16 12 Experiments on TAC-KBP 2013 linkable mentions Baseline: Compared with top3 supervised and top3 unsupervised systems from TAC KBP 2013 Error Analysis: 1) context capturing is deficient 2) simple coreference rules 3) certain relations are missing in the KB. (Zheng et ,al 2014)
  • 13. Experiments - Generic English Corpora 9/12/16 13 • SR outperforms the best KBP unsupervised system (0.632). • Although CS did not produce a lot more correct linking results than SR did, but it promote a great number of good candidates to the top of the ranking list. • CS is deficient in recognizing the subtle contextual difference among similar candidates (the same type).
  • 14. Experiments - Generic Chinese Corpora 9/12/16 14 • Fahrnl et al.(2012) used over 20 fine-tuned features and many linguistic resource. • Error Analysis: • A Low recall on mapping candidates between English and Chinese
  • 15. Experiments – Specific domain 9/12/16 15 • There is slight improvement in biomedical science because candidates of the related mentions mostly have similar relations in the KB. • First study on earth science domain • Errors Analysis: • There are biased effects caused by salience ranking when using generic KB • Some relations are not clearly defined in DBpedia.
  • 16. Conclusion and Future work u QCV has minimal reliance on linguistic analysis and the deep utilization of structured KBs. u The demonstrated a high-performance EL approach that can be migrated to new languages and domains. u They plan to better extract mention context and incorporate the impact of more distance KB entities other than just the neighbors. 9/12/16 16
  • 17. 感想 (I) u For conversational collective ranking of unsupervised EL approaches, time complexity is a significant problem, the upper bound of the computing time to link all mentions in a documents is O(nm*nc*nnc*nnm). u It is worth learning that they gave intensive analysis on each experiment result. u Since their method is less reliable on source document, it could be also applied on short texts (Twitters, queries of search engines). 9/12/16 17
  • 18. 感想 (II) u Their approach may not effective when there is seldom co- occurring mentions. u We expect their system performance cross more generic English corpus. u Their method worked on linkable mentions, but their method can not solve unlinkable mentions (NILs). u For a new language and a new domain, they used the same KB (DBpedia), and their system performance was effected by the structured KB. So their performance need to be verified with new KBs. 9/12/16 18