SlideShare a Scribd company logo
http://www.lattice.cnrs.fr | Demonstrations at NAACL HLT 2015, Conference of the North American Chapter of the Association for Computational Linguistics – Human Language Technologies, Denver, Colorado (US), May 31-June 5
Expression extractions should be improved and implemented on open source software. The careful use of natural language processing
algorithms could provide better filtering metrics and support in expression merging
The manual filtering is crucial because it allows entities to be reduced to a set size appropriate for analysis, but also recovering
important entities that could have been excluded by the automatic filtering.
Expressed in [1] by social scientists from médialab (Paris Institute of Political Studies, SciencesPo)
OOV IV
LATTICE Lab
CNRS – Ecole Normale Supérieure
U Paris 3 Sorbonne Nouvelle
ELCO3: Entity Linking with Corpus Coherence Combining Open Source Annotators
Pablo Ruiz, Thierry Poibeau and Frédérique Mélanie
pablo.ruiz.fabo@ens.fr
Our users’ needs in Entity Linking (EL)
o Target users: social science researchers
o Performance of EL systems varies widely depending on corpus
characteristics and types of entities required
o Difficult for users to choose optimal EL system for their corpora
o Our target users wish to filter EL results, making informed
choices about entities to keep and discard
o Public open source tools
o Combine outputs of several tools to get complementary results
o Providing metrics for users to evaluate quality of an annotation
o Simultaneous access to metrics and text to validate annotations
o Besides manual selection, automatic selection also possible via
weighted voting of annotations
The Problem Our Approach
Demo features
TRAFFIC-LIGHT MATRIX FORMAT
o Annotation confidence scores provided by EL services
o Measures of coherence between an entity and the most
representative entities in the corpus
› Wikipedia Link-based Measure: Relatedness between two entities
as a function of Wikipedia pages linking to both and linking to one only
Milne-Witten [3] coherence between entities e1 and e2 (as in Hoffart et al. [4])
› Other possible measures
• Distance between entities’ categories in a Wikipedia
category graph
Corpus: subset of PoliInformatics [2], about 2008 US financial crisis
(1) Query via Search Text displays:
• Document Panel: Documents matching the query
• Entity Panel: Entities extracted in the documents matching the
query displayed on doc. panel, plus:
(2) Confidence Scores for each annotator, normalized to a 0-1
range. (T=Tagme, S=Spotlight, W=Wikipedia Miner).
(3) Coherence score between the entity and a representative
subset of the corpus entities.
(4) Entities not coherent with the corpus are flagged in red.
(5) Query via Search Entities displays:
• Entity Panel: Entities matching the query.
• Document Panel: Documents containing one of the entities
displayed on the entity panel.
(6) Refine Search: Entities can be selected with a list of types
(like ORG) or selected individually with checkboxes.
(7) The Auto-Selection tab shows the output of an automatic
filtering via weighted voting of annotations.
(8) Charts: examples of co-occurrence networks, created offline
exploiting workflow information (sentence number, confidence, …)
0.0
1.0
Scale
DOC.PANELENTITYPANEL 1
5
3
4
6
2
7
8
System workflows
o User always has access to full results, but the workflow can
select a subset of the annotations automatically.
o Workflow combines, via weighted voting, outputs of:
TagMe2, DBpedia Spotlight, Wikipedia Miner, AIDA, Babelfy
o Votes are weighted according to each annotator’s precision on
two reference corpora (IITB and AIDA/CONLL B), depending on
whether user requires annotations for common-noun entity
mentions or not.
on demo not shown on demo
Evaluation
o Automatic EL system combination improved results over each
individual system’s results ([5], our *SEM poster).
o Assessed with strong annotation match and entity match [6] on
four different corpora: AIDA/CONLL B, IITB, MSNBC, AQUAINT.
[1] T. Venturini & D. Guido. 2012. Once upon a text. An ANT [Actor-Network Theory] Tale in Text
Analytics. Sociologica, 3:1-17. Il Mulino, Bologna.
[2] N. Smith et al. 2014. Overview of the 2014 NLP Unshared Task in PoliInformatics. In Proc. ACL
LACSS Workshop.
[3] D. Milne & I. Witten. 2008. An effective, low-cost measure of semantic relatedness obtained from
Wikipedia links. In Proc AAAI WS on Wikipedia and AI.
[4] J. Hoffart et al. 2011. Robust disambiguation of named entities in text. In Proc. EMNLP.
[5] P. Ruiz & T. Poibeau. 2015. Combining open source annotators for entity linking through
weighted voting. In Proc. *SEM.
[6] M. Cornolti, P. Ferragina & M. Ciaramita. (2013). A framework for benchmarking entity-annotation
systems. In Proc. of WWW, 249-260.
Metrics to assist in manual filtering
Annotation voting for automatic filtering
DEMO LINK: http://129.199.228.10/nav/gui/

More Related Content

Similar to Entity Linking Combining Open Source Annotators

A Survey of Ontology-based Information Extraction for Social Media Content An...
A Survey of Ontology-based Information Extraction for Social Media Content An...A Survey of Ontology-based Information Extraction for Social Media Content An...
A Survey of Ontology-based Information Extraction for Social Media Content An...
ijcnes
 
Finding prominent features in communities in social networks using ontology
Finding prominent features in communities in social networks using ontologyFinding prominent features in communities in social networks using ontology
Finding prominent features in communities in social networks using ontology
csandit
 
Safeguarding Abila: Discovering Evolving Activist Networks
Safeguarding Abila: Discovering Evolving Activist NetworksSafeguarding Abila: Discovering Evolving Activist Networks
Safeguarding Abila: Discovering Evolving Activist Networks
Parang Saraf
 
Building better knowledge graphs through social computing
Building better knowledge graphs through social computingBuilding better knowledge graphs through social computing
Building better knowledge graphs through social computing
Elena Simperl
 
Syst biol 2012-burguiere-sysbio sys069
Syst biol 2012-burguiere-sysbio sys069Syst biol 2012-burguiere-sysbio sys069
Syst biol 2012-burguiere-sysbio sys069
Thomas Burguiere
 
IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...
IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...
IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...
IRJET Journal
 
Sub1557
Sub1557Sub1557
Iot ontologies state of art$$$
Iot ontologies state of art$$$Iot ontologies state of art$$$
Iot ontologies state of art$$$
Sof Ouni
 
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...
IJwest
 
Rule-based Information Extraction from Disease Outbreak Reports
Rule-based Information Extraction from Disease Outbreak ReportsRule-based Information Extraction from Disease Outbreak Reports
Rule-based Information Extraction from Disease Outbreak Reports
Waqas Tariq
 
Ieml social recommendersystems
Ieml social recommendersystemsIeml social recommendersystems
Ieml social recommendersystems
Antonio Medina
 
CSE509 Lecture 5
CSE509 Lecture 5CSE509 Lecture 5
Cyworld Jeju 2009 Conference(10 Aug2009)No2(2)
Cyworld Jeju 2009 Conference(10 Aug2009)No2(2)Cyworld Jeju 2009 Conference(10 Aug2009)No2(2)
Cyworld Jeju 2009 Conference(10 Aug2009)No2(2)
SangMe Nam
 
Assignment 5 interoperability slide share
Assignment 5 interoperability slide shareAssignment 5 interoperability slide share
Assignment 5 interoperability slide share
rwpreston135
 
Tags as tools for social classification
Tags as tools for social classificationTags as tools for social classification
Tags as tools for social classification
Isabella Peters
 
RUNNING HEADER Analytics Ecosystem1Analytics Ecosystem4.docx
RUNNING HEADER Analytics Ecosystem1Analytics Ecosystem4.docxRUNNING HEADER Analytics Ecosystem1Analytics Ecosystem4.docx
RUNNING HEADER Analytics Ecosystem1Analytics Ecosystem4.docx
anhlodge
 
Notes on mining social media updated
Notes on mining social media updatedNotes on mining social media updated
Notes on mining social media updated
Gary Myers KMb Unit, York University
 
eventdemo2016
eventdemo2016eventdemo2016
eventdemo2016
Rachel Guan
 
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalKeystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Mauro Dragoni
 
IRJET- Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
IRJET-  	  Improved Real-Time Twitter Sentiment Analysis using ML & Word2VecIRJET-  	  Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
IRJET- Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
IRJET Journal
 

Similar to Entity Linking Combining Open Source Annotators (20)

A Survey of Ontology-based Information Extraction for Social Media Content An...
A Survey of Ontology-based Information Extraction for Social Media Content An...A Survey of Ontology-based Information Extraction for Social Media Content An...
A Survey of Ontology-based Information Extraction for Social Media Content An...
 
Finding prominent features in communities in social networks using ontology
Finding prominent features in communities in social networks using ontologyFinding prominent features in communities in social networks using ontology
Finding prominent features in communities in social networks using ontology
 
Safeguarding Abila: Discovering Evolving Activist Networks
Safeguarding Abila: Discovering Evolving Activist NetworksSafeguarding Abila: Discovering Evolving Activist Networks
Safeguarding Abila: Discovering Evolving Activist Networks
 
Building better knowledge graphs through social computing
Building better knowledge graphs through social computingBuilding better knowledge graphs through social computing
Building better knowledge graphs through social computing
 
Syst biol 2012-burguiere-sysbio sys069
Syst biol 2012-burguiere-sysbio sys069Syst biol 2012-burguiere-sysbio sys069
Syst biol 2012-burguiere-sysbio sys069
 
IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...
IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...
IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...
 
Sub1557
Sub1557Sub1557
Sub1557
 
Iot ontologies state of art$$$
Iot ontologies state of art$$$Iot ontologies state of art$$$
Iot ontologies state of art$$$
 
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...
 
Rule-based Information Extraction from Disease Outbreak Reports
Rule-based Information Extraction from Disease Outbreak ReportsRule-based Information Extraction from Disease Outbreak Reports
Rule-based Information Extraction from Disease Outbreak Reports
 
Ieml social recommendersystems
Ieml social recommendersystemsIeml social recommendersystems
Ieml social recommendersystems
 
CSE509 Lecture 5
CSE509 Lecture 5CSE509 Lecture 5
CSE509 Lecture 5
 
Cyworld Jeju 2009 Conference(10 Aug2009)No2(2)
Cyworld Jeju 2009 Conference(10 Aug2009)No2(2)Cyworld Jeju 2009 Conference(10 Aug2009)No2(2)
Cyworld Jeju 2009 Conference(10 Aug2009)No2(2)
 
Assignment 5 interoperability slide share
Assignment 5 interoperability slide shareAssignment 5 interoperability slide share
Assignment 5 interoperability slide share
 
Tags as tools for social classification
Tags as tools for social classificationTags as tools for social classification
Tags as tools for social classification
 
RUNNING HEADER Analytics Ecosystem1Analytics Ecosystem4.docx
RUNNING HEADER Analytics Ecosystem1Analytics Ecosystem4.docxRUNNING HEADER Analytics Ecosystem1Analytics Ecosystem4.docx
RUNNING HEADER Analytics Ecosystem1Analytics Ecosystem4.docx
 
Notes on mining social media updated
Notes on mining social media updatedNotes on mining social media updated
Notes on mining social media updated
 
eventdemo2016
eventdemo2016eventdemo2016
eventdemo2016
 
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalKeystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
 
IRJET- Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
IRJET-  	  Improved Real-Time Twitter Sentiment Analysis using ML & Word2VecIRJET-  	  Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
IRJET- Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
 

Recently uploaded

fermented food science of sauerkraut.pptx
fermented food science of sauerkraut.pptxfermented food science of sauerkraut.pptx
fermented food science of sauerkraut.pptx
ananya23nair
 
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
Sérgio Sacani
 
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDSJAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
Sérgio Sacani
 
23PH301 - Optics - Unit 1 - Optical Lenses
23PH301 - Optics  -  Unit 1 - Optical Lenses23PH301 - Optics  -  Unit 1 - Optical Lenses
23PH301 - Optics - Unit 1 - Optical Lenses
RDhivya6
 
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
frank0071
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
Carl Bergstrom
 
2001_Book_HumanChromosomes - Genéticapdf
2001_Book_HumanChromosomes - Genéticapdf2001_Book_HumanChromosomes - Genéticapdf
2001_Book_HumanChromosomes - Genéticapdf
lucianamillenium
 
Signatures of wave erosion in Titan’s coasts
Signatures of wave erosion in Titan’s coastsSignatures of wave erosion in Titan’s coasts
Signatures of wave erosion in Titan’s coasts
Sérgio Sacani
 
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
Creative-Biolabs
 
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
ABHISHEK SONI NIMT INSTITUTE OF MEDICAL AND PARAMEDCIAL SCIENCES , GOVT PG COLLEGE NOIDA
 
Sustainable Land Management - Climate Smart Agriculture
Sustainable Land Management - Climate Smart AgricultureSustainable Land Management - Climate Smart Agriculture
Sustainable Land Management - Climate Smart Agriculture
International Food Policy Research Institute- South Asia Office
 
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptxTOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
shubhijain836
 
Quality assurance B.pharm 6th semester BP606T UNIT 5
Quality assurance B.pharm 6th semester BP606T UNIT 5Quality assurance B.pharm 6th semester BP606T UNIT 5
Quality assurance B.pharm 6th semester BP606T UNIT 5
vimalveerammal
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
hozt8xgk
 
Direct Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart AgricultureDirect Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart Agriculture
International Food Policy Research Institute- South Asia Office
 
Flow chart.pdf LIFE SCIENCES CSIR UGC NET CONTENT
Flow chart.pdf  LIFE SCIENCES CSIR UGC NET CONTENTFlow chart.pdf  LIFE SCIENCES CSIR UGC NET CONTENT
Flow chart.pdf LIFE SCIENCES CSIR UGC NET CONTENT
savindersingh16
 
Pests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdfPests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdf
PirithiRaju
 
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
Sérgio Sacani
 
Anti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark UniverseAnti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark Universe
Sérgio Sacani
 
Lattice Defects in ionic solid compound.pptx
Lattice Defects in ionic solid compound.pptxLattice Defects in ionic solid compound.pptx
Lattice Defects in ionic solid compound.pptx
DrRajeshDas
 

Recently uploaded (20)

fermented food science of sauerkraut.pptx
fermented food science of sauerkraut.pptxfermented food science of sauerkraut.pptx
fermented food science of sauerkraut.pptx
 
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
 
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDSJAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
 
23PH301 - Optics - Unit 1 - Optical Lenses
23PH301 - Optics  -  Unit 1 - Optical Lenses23PH301 - Optics  -  Unit 1 - Optical Lenses
23PH301 - Optics - Unit 1 - Optical Lenses
 
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
 
2001_Book_HumanChromosomes - Genéticapdf
2001_Book_HumanChromosomes - Genéticapdf2001_Book_HumanChromosomes - Genéticapdf
2001_Book_HumanChromosomes - Genéticapdf
 
Signatures of wave erosion in Titan’s coasts
Signatures of wave erosion in Titan’s coastsSignatures of wave erosion in Titan’s coasts
Signatures of wave erosion in Titan’s coasts
 
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
 
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
 
Sustainable Land Management - Climate Smart Agriculture
Sustainable Land Management - Climate Smart AgricultureSustainable Land Management - Climate Smart Agriculture
Sustainable Land Management - Climate Smart Agriculture
 
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptxTOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
 
Quality assurance B.pharm 6th semester BP606T UNIT 5
Quality assurance B.pharm 6th semester BP606T UNIT 5Quality assurance B.pharm 6th semester BP606T UNIT 5
Quality assurance B.pharm 6th semester BP606T UNIT 5
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
 
Direct Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart AgricultureDirect Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart Agriculture
 
Flow chart.pdf LIFE SCIENCES CSIR UGC NET CONTENT
Flow chart.pdf  LIFE SCIENCES CSIR UGC NET CONTENTFlow chart.pdf  LIFE SCIENCES CSIR UGC NET CONTENT
Flow chart.pdf LIFE SCIENCES CSIR UGC NET CONTENT
 
Pests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdfPests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdf
 
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
 
Anti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark UniverseAnti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark Universe
 
Lattice Defects in ionic solid compound.pptx
Lattice Defects in ionic solid compound.pptxLattice Defects in ionic solid compound.pptx
Lattice Defects in ionic solid compound.pptx
 

Entity Linking Combining Open Source Annotators

  • 1. http://www.lattice.cnrs.fr | Demonstrations at NAACL HLT 2015, Conference of the North American Chapter of the Association for Computational Linguistics – Human Language Technologies, Denver, Colorado (US), May 31-June 5 Expression extractions should be improved and implemented on open source software. The careful use of natural language processing algorithms could provide better filtering metrics and support in expression merging The manual filtering is crucial because it allows entities to be reduced to a set size appropriate for analysis, but also recovering important entities that could have been excluded by the automatic filtering. Expressed in [1] by social scientists from médialab (Paris Institute of Political Studies, SciencesPo) OOV IV LATTICE Lab CNRS – Ecole Normale Supérieure U Paris 3 Sorbonne Nouvelle ELCO3: Entity Linking with Corpus Coherence Combining Open Source Annotators Pablo Ruiz, Thierry Poibeau and Frédérique Mélanie pablo.ruiz.fabo@ens.fr Our users’ needs in Entity Linking (EL) o Target users: social science researchers o Performance of EL systems varies widely depending on corpus characteristics and types of entities required o Difficult for users to choose optimal EL system for their corpora o Our target users wish to filter EL results, making informed choices about entities to keep and discard o Public open source tools o Combine outputs of several tools to get complementary results o Providing metrics for users to evaluate quality of an annotation o Simultaneous access to metrics and text to validate annotations o Besides manual selection, automatic selection also possible via weighted voting of annotations The Problem Our Approach Demo features TRAFFIC-LIGHT MATRIX FORMAT o Annotation confidence scores provided by EL services o Measures of coherence between an entity and the most representative entities in the corpus › Wikipedia Link-based Measure: Relatedness between two entities as a function of Wikipedia pages linking to both and linking to one only Milne-Witten [3] coherence between entities e1 and e2 (as in Hoffart et al. [4]) › Other possible measures • Distance between entities’ categories in a Wikipedia category graph Corpus: subset of PoliInformatics [2], about 2008 US financial crisis (1) Query via Search Text displays: • Document Panel: Documents matching the query • Entity Panel: Entities extracted in the documents matching the query displayed on doc. panel, plus: (2) Confidence Scores for each annotator, normalized to a 0-1 range. (T=Tagme, S=Spotlight, W=Wikipedia Miner). (3) Coherence score between the entity and a representative subset of the corpus entities. (4) Entities not coherent with the corpus are flagged in red. (5) Query via Search Entities displays: • Entity Panel: Entities matching the query. • Document Panel: Documents containing one of the entities displayed on the entity panel. (6) Refine Search: Entities can be selected with a list of types (like ORG) or selected individually with checkboxes. (7) The Auto-Selection tab shows the output of an automatic filtering via weighted voting of annotations. (8) Charts: examples of co-occurrence networks, created offline exploiting workflow information (sentence number, confidence, …) 0.0 1.0 Scale DOC.PANELENTITYPANEL 1 5 3 4 6 2 7 8 System workflows o User always has access to full results, but the workflow can select a subset of the annotations automatically. o Workflow combines, via weighted voting, outputs of: TagMe2, DBpedia Spotlight, Wikipedia Miner, AIDA, Babelfy o Votes are weighted according to each annotator’s precision on two reference corpora (IITB and AIDA/CONLL B), depending on whether user requires annotations for common-noun entity mentions or not. on demo not shown on demo Evaluation o Automatic EL system combination improved results over each individual system’s results ([5], our *SEM poster). o Assessed with strong annotation match and entity match [6] on four different corpora: AIDA/CONLL B, IITB, MSNBC, AQUAINT. [1] T. Venturini & D. Guido. 2012. Once upon a text. An ANT [Actor-Network Theory] Tale in Text Analytics. Sociologica, 3:1-17. Il Mulino, Bologna. [2] N. Smith et al. 2014. Overview of the 2014 NLP Unshared Task in PoliInformatics. In Proc. ACL LACSS Workshop. [3] D. Milne & I. Witten. 2008. An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In Proc AAAI WS on Wikipedia and AI. [4] J. Hoffart et al. 2011. Robust disambiguation of named entities in text. In Proc. EMNLP. [5] P. Ruiz & T. Poibeau. 2015. Combining open source annotators for entity linking through weighted voting. In Proc. *SEM. [6] M. Cornolti, P. Ferragina & M. Ciaramita. (2013). A framework for benchmarking entity-annotation systems. In Proc. of WWW, 249-260. Metrics to assist in manual filtering Annotation voting for automatic filtering DEMO LINK: http://129.199.228.10/nav/gui/