SlideShare a Scribd company logo
1 l June 3, 20151 l
Troubleshooting and Optimizing Named Entity
Resolution Systems in the Industry
Panos Alexopoulos
Semantic Applications Research Manager
Expert System Iberia
ESWC 2015
Portoroz, Slovenia
Named Entity Resolution
● An Information Extraction task where:
● We detecting mentions of named entities in texts (e.g. people,
organizations, locations, etc.).
● We map these mentions to the entities they refer to in an
unambiguous way.
Figure from Hoffart et al: “Robust Disambiguation of Named Entities in Text”
NER Tools and Frameworks
Different features (background
knowledge, algorithms, customization,
etc.)
Effectiveness empirically measured in
various experiments with several
datasets
NER Systems Evaluations (F1 Scores)
● AIDA
● 83% on the AIDA-YAGO2 dataset
● 62% on Reuters-21578
● DBPedia Spotlight
● 81% on a set of 155,000 wikilink samples
● 56% on a set of 35 paragraphs from New York
● 34% on the AIDA/CO-NLL-TestB dataset
● AGDISTIS
● 76% on the AQUAINT dataset
● 60% on the AIDA/CO-NLLTestB dataset
● 31% on the IITB dataset
What the evaluations show
A NER system’s satisfactory performance
in a given scenario does not constitute a
trustworthy predictor of its performance
in a different scenario.
Our goal is to avoid the situation below
Our system has
normally 85%
precision
Yes, but for this
client, it only achieves
25%!
What we have done for that
1. We analyzed the typical way a NER system works and
identified (some) potential causes of low NER
effectiveness.
2. We defined a set of metrics for determining in more depth
the causes for low NER effectiveness in a given scenario.
3. We mapped the potential values of these metrics to
actions that may increase NER effectiveness.
How a NER system typically works
When things may go wrong
● Low Precision: The text does not really contain the system-assigned
entities.
● Potential Reasons:
● High ambiguity in the domain and/or texts.
● Evidence applied not enough/appropriate for the texts we have.
● Low Recall: System fails to detect entities in the text that are actually
there.
● Potential Reasons:
● The thesaurus is incomplete
● The system requires a certain minimum amount of evidence per
entity but cannot find it, either in the background knowledge or in
the texts.
How do we know what happens in our scenario
● We calculate two set of metrics:
● Ambiguity Metrics: Measure the level of ambiguity in our domain
knowledge and texts.
● Evidence Adequacy Metrics: Measure how appropriate the
domain knowledge that we apply as evidence is.
● To do that we first perform:
● Manual annotation to a representative sample of the input texts
with target and non-target entities from the knowledge graph.
● Automatic annotation to the same texts without any
disambiguation (i.e. term matching only!)
Ambiguity Types
● Lexical Ambiguity of entity names, i.e. ambiguity between the target entities
and common non-entity terms
● E.g., “Page” – the piece of paper or a person?
● Target Entity Ambiguity, i.e., ambiguity between the target entities
● E.g, “Tripoli” – the one in Greece or the one in Libya?
● Knowledge Graph Ambiguity, i.e., ambiguity between the target entities and
other entities in the ontology.
● E.g., “Barcelona” - the team or the city?
● Global Ambiguity, i.e. ambiguity between the target entities and entities from
other domains, not covered by our knowledge graph.
● E.g., “Orange” - the company or the fruit?
Ambiguity Metrics
● Lexical Ambiguity: The percentage of terms which:
● Are common lexical terms rather than entities in the text.
● Have been wrongly mapped by the system to one or more target entities.
● Target Entity Ambiguity: The percentage of terms which:
● Correspond to a target entity.
● Have been mapped by the system to this target entity but also to other
target entities.
● Global Ambiguity: The percentage of terms which
● Are not common lexical terms but actual entities in the texts.
● Do not correspond to any knowledge graph entity.
● Have been wrongly mapped by the system to one or more target entities.
Ambiguity Metrics
● Knowledge Graph Ambiguity: Two metrics:
● KGA1: The percentage of terms which:
● Correspond to a target entity
● Have been mapped by the system to this entity but also to other
non-target entities.
● KGA2: The percentage of terms which:
● Correspond to a non-target entity
● Have been mapped by the system to this entity but also to other
target entities.
● KGA1 shows how noisy our Knowledge Graph is with respect to the texts!
● KGA2 shows how noisy our texts are with respect to the Knowledge Graph!
Evidence Adequacy Metrics
● Knowledge Graph Richness
● Percentage of target entities with no related entities in the graph.
● Average number of entities a target entity is related to (overall and per
relation) in the graph.
● Knowledge Graph Prevalence in Texts
● Percentage of target entities for which there is at least one evidential
entity in the texts (overall and per relation).
● Average number of evidential entities a target entity is related to in the
texts (overall and per relation).
Interpreting and acting on the metrics
Metric Values Diagnosis Action
• High Lexical
Ambiguity
The NER system cannot perform
well enough Word Sense
Disambiguation
Improve the linguistic
analysis of the NER
system.
• High Global
Ambiguity
Many of the input texts are not
really related to the domain of
the target entities
Use a domain/topic
classifier to filter out the
non-relevant texts and
apply the NER process
only to the relevant ones.
• High KGA1
• Low KGA2
The evidence knowledge graph
contains several non-target entities
that hamper the disambiguation
process rather than helping it.
Prune the evidence
knowledge graph and keep
the most prevalent entities.
Interpreting and acting on the metrics
Metric Values Diagnosis Action
• Low Knowledge
Graph Richness
Knowledge Graph is not
adequate as disambiguation
evidence.
Enrich the knowledge
graph starting from the
most prevalent relations
• High Knowledge
Graph Richness
• Low Text Prevalence
Knowledge Graph is not
adequate as disambiguation
evidence.
Change or expand the
knowledge graph with
entities that are more likely
to appear in the texts.
• Low Knowledge
Graph Text Prevalence
• Low Target Entity
Ambiguity
• Low Knowledge
Graph Ambiguity
The system’s minimum
evidence threshold is too
high
Decrease the threshold.
Framework Application Cases
● Target Texts: Short textual
descriptions of video scenes from
football matches.
● Target Entities: Football players
and teams.
● NER System: Knowledge Tagger
(in-house)
● Knowledge Graph: DBPedia
● (Initial) NER Effectiveness:
● P = 60%
● R = 55%
Case 1: Football
● Target Texts: News articles from
JSI’s Newsfeed
● Target Entities: Startup
companies
● NER System: Knowledge Tagger
(in-house)
● Knowledge Graph: custom-built
containing info about founders,
investors, competitors etc.
● (Initial) NER Effectiveness:
● P = 35%
● R = 50%
Case 2: Startups
Ambiguity Metrics
Metric / Case Football Startups
Lexical Ambiguity 1% 10%
Target Entity Ambiguity 30% 4%
KGA1 56% 4%
KGA2 4% 3%
Global Ambiguity 2% 40%
Evidence Adequacy Metrics - Knowledge Graph Prevalence
Relation Prevalenc
e
Players and their
current club
85%
Players and their
current co-players
95%
Players and their
current managers
75%
Players and their
nationality
10%
Players and their place
of birth
2%
Football Startups
Relation Prevalence
Companies and their
business areas
50%
Companies and their
founders
40%
Companies and their
competitors
35%
Companies and their
CEO
20%
Companies and their
investors
15%
● Metric Values:
● Considerable Lexical Ambiguity
● High Global Ambiguity
● Mediocre Evidence Adequacy
● Actions:
● Applied heuristic rules for company
names detection.
● Applied a classifier to filter out
irrelevant news articles
● Reduced the evidence threshold
● Achieved NER Effectiveness:
● P = 78%
● R = 62%
From metrics to actions
● Metric Values:
● High KGA1
● Low KGA2
● Good Evidence Adequacy
● Actions:
● We pruned the knowledge
graph by removing non-
football related entities and
keeping only the 3 most
prevalent relations.
● Achieved NER Effectiveness:
● P = 82%
● R = 80%
Football Startups
Wrapping Up
● Key Points:
● Our NER diagnostics framework is informal and crude but has proved
very helpful in optimizing our client’s NER deployments.
● The main lesson we’ve learned is that it’s very hard to build one NER
solution for all possible scenarios; therefore NER systems must be easily
and intuitively customizable.
● Future Agenda:
● Implement a comprehensive and intuitive visualization of the metrics.
● Define metrics for measuring the evidential adequacy of textual
knowledge resources.
● Automate the interpretation of metric values by means of formal rules.
Thank you for your attention!
Dr. Panos Alexopoulos
Semantic Applications Research Manager
Email:  palexopoulos@expertsystem.com
Web: www.panosalexopoulos.com
LinkedIn: www.linkedin.com/in/panosalexopoulos
Twitter: @PAlexop

More Related Content

What's hot

A Framework for Arabic Concept-Level Sentiment Analysis using SenticNet
A Framework for Arabic Concept-Level Sentiment Analysis using SenticNet A Framework for Arabic Concept-Level Sentiment Analysis using SenticNet
A Framework for Arabic Concept-Level Sentiment Analysis using SenticNet
IJECEIAES
 
Sentiment analysis
Sentiment analysisSentiment analysis
Sentiment analysis
Amenda Joy
 
SENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEWSENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEW
Journal For Research
 
Project sentiment analysis
Project sentiment analysisProject sentiment analysis
Project sentiment analysis
Bob Prieto
 
A review of sentiment analysis approaches in big
A review of sentiment analysis approaches in bigA review of sentiment analysis approaches in big
A review of sentiment analysis approaches in big
Nurfadhlina Mohd Sharef
 
Sentiment Analysis of Feedback Data
Sentiment Analysis of Feedback DataSentiment Analysis of Feedback Data
Sentiment Analysis of Feedback Data
ijtsrd
 
295B_Report_Sentiment_analysis
295B_Report_Sentiment_analysis295B_Report_Sentiment_analysis
295B_Report_Sentiment_analysis
Zahid Azam
 
NLP Ecosystem
NLP EcosystemNLP Ecosystem
Datapedia Analysis Report
Datapedia Analysis ReportDatapedia Analysis Report
Datapedia Analysis Report
Abanoub Amgad
 
Predictive Text Analytics
Predictive Text AnalyticsPredictive Text Analytics
Predictive Text Analytics
Seth Grimes
 
Text Analytics for Dummies 2010
Text Analytics for Dummies 2010Text Analytics for Dummies 2010
Text Analytics for Dummies 2010
Seth Grimes
 
Project report
Project reportProject report
Project report
Utkarsh Soni
 
Lexalytics Text Analytics Workshop: Perfect Text Analytics
Lexalytics Text Analytics Workshop: Perfect Text AnalyticsLexalytics Text Analytics Workshop: Perfect Text Analytics
Lexalytics Text Analytics Workshop: Perfect Text Analytics
Lexalytics
 
Practical sentiment analysis
Practical sentiment analysisPractical sentiment analysis
Practical sentiment analysis
Diana Maynard
 
An Introduction to Text Analytics: 2013 Workshop presentation
An Introduction to Text Analytics: 2013 Workshop presentationAn Introduction to Text Analytics: 2013 Workshop presentation
An Introduction to Text Analytics: 2013 Workshop presentation
Seth Grimes
 
project sentiment analysis
project sentiment analysisproject sentiment analysis
project sentiment analysis
sneha penmetsa
 
Query recommendation papers
Query recommendation papersQuery recommendation papers
Query recommendation papers
Ashish Kulkarni
 
Comparative Study on Lexicon-based sentiment analysers over Negative sentiment
Comparative Study on Lexicon-based sentiment analysers over Negative sentimentComparative Study on Lexicon-based sentiment analysers over Negative sentiment
Comparative Study on Lexicon-based sentiment analysers over Negative sentiment
AI Publications
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
Wagston Staehler
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
ishan0019
 

What's hot (20)

A Framework for Arabic Concept-Level Sentiment Analysis using SenticNet
A Framework for Arabic Concept-Level Sentiment Analysis using SenticNet A Framework for Arabic Concept-Level Sentiment Analysis using SenticNet
A Framework for Arabic Concept-Level Sentiment Analysis using SenticNet
 
Sentiment analysis
Sentiment analysisSentiment analysis
Sentiment analysis
 
SENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEWSENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEW
 
Project sentiment analysis
Project sentiment analysisProject sentiment analysis
Project sentiment analysis
 
A review of sentiment analysis approaches in big
A review of sentiment analysis approaches in bigA review of sentiment analysis approaches in big
A review of sentiment analysis approaches in big
 
Sentiment Analysis of Feedback Data
Sentiment Analysis of Feedback DataSentiment Analysis of Feedback Data
Sentiment Analysis of Feedback Data
 
295B_Report_Sentiment_analysis
295B_Report_Sentiment_analysis295B_Report_Sentiment_analysis
295B_Report_Sentiment_analysis
 
NLP Ecosystem
NLP EcosystemNLP Ecosystem
NLP Ecosystem
 
Datapedia Analysis Report
Datapedia Analysis ReportDatapedia Analysis Report
Datapedia Analysis Report
 
Predictive Text Analytics
Predictive Text AnalyticsPredictive Text Analytics
Predictive Text Analytics
 
Text Analytics for Dummies 2010
Text Analytics for Dummies 2010Text Analytics for Dummies 2010
Text Analytics for Dummies 2010
 
Project report
Project reportProject report
Project report
 
Lexalytics Text Analytics Workshop: Perfect Text Analytics
Lexalytics Text Analytics Workshop: Perfect Text AnalyticsLexalytics Text Analytics Workshop: Perfect Text Analytics
Lexalytics Text Analytics Workshop: Perfect Text Analytics
 
Practical sentiment analysis
Practical sentiment analysisPractical sentiment analysis
Practical sentiment analysis
 
An Introduction to Text Analytics: 2013 Workshop presentation
An Introduction to Text Analytics: 2013 Workshop presentationAn Introduction to Text Analytics: 2013 Workshop presentation
An Introduction to Text Analytics: 2013 Workshop presentation
 
project sentiment analysis
project sentiment analysisproject sentiment analysis
project sentiment analysis
 
Query recommendation papers
Query recommendation papersQuery recommendation papers
Query recommendation papers
 
Comparative Study on Lexicon-based sentiment analysers over Negative sentiment
Comparative Study on Lexicon-based sentiment analysers over Negative sentimentComparative Study on Lexicon-based sentiment analysers over Negative sentiment
Comparative Study on Lexicon-based sentiment analysers over Negative sentiment
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 

Similar to Troubleshooting and Optimizing Named Entity Resolution Systems in the Industry

Environmental Health – PHE 443U-001Winter Term 2018Understandi.docx
Environmental Health – PHE 443U-001Winter Term 2018Understandi.docxEnvironmental Health – PHE 443U-001Winter Term 2018Understandi.docx
Environmental Health – PHE 443U-001Winter Term 2018Understandi.docx
SALU18
 
Reference Domain Ontologies and Large Medical Language Models.pptx
Reference Domain Ontologies and Large Medical Language Models.pptxReference Domain Ontologies and Large Medical Language Models.pptx
Reference Domain Ontologies and Large Medical Language Models.pptx
Chimezie Ogbuji
 
Lesson1
Lesson1Lesson1
Data Science & AI Road Map by Python & Computer science tutor in Malaysia
Data Science  & AI Road Map by Python & Computer science tutor in MalaysiaData Science  & AI Road Map by Python & Computer science tutor in Malaysia
Data Science & AI Road Map by Python & Computer science tutor in Malaysia
Ahmed Elmalla
 
Veda Semantics - introduction document
Veda Semantics - introduction documentVeda Semantics - introduction document
Veda Semantics - introduction document
rajatkr
 
TitleABC123 Version X1Clinical Preparation Checklist.docx
TitleABC123 Version X1Clinical Preparation Checklist.docxTitleABC123 Version X1Clinical Preparation Checklist.docx
TitleABC123 Version X1Clinical Preparation Checklist.docx
herthalearmont
 
Entity Search Engine
Entity Search Engine Entity Search Engine
Resume_Clasification.pptx
Resume_Clasification.pptxResume_Clasification.pptx
Resume_Clasification.pptx
MOINDALVS
 
Software prototyping.pptx
Software prototyping.pptxSoftware prototyping.pptx
Software prototyping.pptx
DrTThendralCompSci
 
Business analyst
Business analystBusiness analyst
Business analyst
Hemanth Kumar
 
Lu2 introduction to statistics
Lu2 introduction to statisticsLu2 introduction to statistics
Lu2 introduction to statistics
LamineKaba6
 
Basics of nursing informatics
Basics of nursing informaticsBasics of nursing informatics
Basics of nursing informatics
IanSuson
 
Text analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATEText analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATE
Diana Maynard
 
Data Science 1.pdf
Data Science 1.pdfData Science 1.pdf
Data Science 1.pdf
ArchanaArya17
 
Big Data Analytics - It is here and now!
Big Data Analytics - It is here and now!Big Data Analytics - It is here and now!
Big Data Analytics - It is here and now!
Farhan Khan
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
Boston Institute of Analytics
 
IRJET - Twitter Sentiment Analysis using Machine Learning
IRJET -  	  Twitter Sentiment Analysis using Machine LearningIRJET -  	  Twitter Sentiment Analysis using Machine Learning
IRJET - Twitter Sentiment Analysis using Machine Learning
IRJET Journal
 
Analysis of Metadata and Topic Modeling for
Analysis of Metadata and Topic Modeling forAnalysis of Metadata and Topic Modeling for
Analysis of Metadata and Topic Modeling for
Jigar Mehta
 
Fake news detection
Fake news detection Fake news detection
Fake news detection
shalushamil
 
Data Science and Analysis.pptx
Data Science and Analysis.pptxData Science and Analysis.pptx
Data Science and Analysis.pptx
PrashantYadav931011
 

Similar to Troubleshooting and Optimizing Named Entity Resolution Systems in the Industry (20)

Environmental Health – PHE 443U-001Winter Term 2018Understandi.docx
Environmental Health – PHE 443U-001Winter Term 2018Understandi.docxEnvironmental Health – PHE 443U-001Winter Term 2018Understandi.docx
Environmental Health – PHE 443U-001Winter Term 2018Understandi.docx
 
Reference Domain Ontologies and Large Medical Language Models.pptx
Reference Domain Ontologies and Large Medical Language Models.pptxReference Domain Ontologies and Large Medical Language Models.pptx
Reference Domain Ontologies and Large Medical Language Models.pptx
 
Lesson1
Lesson1Lesson1
Lesson1
 
Data Science & AI Road Map by Python & Computer science tutor in Malaysia
Data Science  & AI Road Map by Python & Computer science tutor in MalaysiaData Science  & AI Road Map by Python & Computer science tutor in Malaysia
Data Science & AI Road Map by Python & Computer science tutor in Malaysia
 
Veda Semantics - introduction document
Veda Semantics - introduction documentVeda Semantics - introduction document
Veda Semantics - introduction document
 
TitleABC123 Version X1Clinical Preparation Checklist.docx
TitleABC123 Version X1Clinical Preparation Checklist.docxTitleABC123 Version X1Clinical Preparation Checklist.docx
TitleABC123 Version X1Clinical Preparation Checklist.docx
 
Entity Search Engine
Entity Search Engine Entity Search Engine
Entity Search Engine
 
Resume_Clasification.pptx
Resume_Clasification.pptxResume_Clasification.pptx
Resume_Clasification.pptx
 
Software prototyping.pptx
Software prototyping.pptxSoftware prototyping.pptx
Software prototyping.pptx
 
Business analyst
Business analystBusiness analyst
Business analyst
 
Lu2 introduction to statistics
Lu2 introduction to statisticsLu2 introduction to statistics
Lu2 introduction to statistics
 
Basics of nursing informatics
Basics of nursing informaticsBasics of nursing informatics
Basics of nursing informatics
 
Text analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATEText analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATE
 
Data Science 1.pdf
Data Science 1.pdfData Science 1.pdf
Data Science 1.pdf
 
Big Data Analytics - It is here and now!
Big Data Analytics - It is here and now!Big Data Analytics - It is here and now!
Big Data Analytics - It is here and now!
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
IRJET - Twitter Sentiment Analysis using Machine Learning
IRJET -  	  Twitter Sentiment Analysis using Machine LearningIRJET -  	  Twitter Sentiment Analysis using Machine Learning
IRJET - Twitter Sentiment Analysis using Machine Learning
 
Analysis of Metadata and Topic Modeling for
Analysis of Metadata and Topic Modeling forAnalysis of Metadata and Topic Modeling for
Analysis of Metadata and Topic Modeling for
 
Fake news detection
Fake news detection Fake news detection
Fake news detection
 
Data Science and Analysis.pptx
Data Science and Analysis.pptxData Science and Analysis.pptx
Data Science and Analysis.pptx
 

Recently uploaded

Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
LucaBarbaro3
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
Hiike
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 

Recently uploaded (20)

Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 

Troubleshooting and Optimizing Named Entity Resolution Systems in the Industry

  • 1. 1 l June 3, 20151 l Troubleshooting and Optimizing Named Entity Resolution Systems in the Industry Panos Alexopoulos Semantic Applications Research Manager Expert System Iberia ESWC 2015 Portoroz, Slovenia
  • 2. Named Entity Resolution ● An Information Extraction task where: ● We detecting mentions of named entities in texts (e.g. people, organizations, locations, etc.). ● We map these mentions to the entities they refer to in an unambiguous way. Figure from Hoffart et al: “Robust Disambiguation of Named Entities in Text”
  • 3. NER Tools and Frameworks Different features (background knowledge, algorithms, customization, etc.) Effectiveness empirically measured in various experiments with several datasets
  • 4. NER Systems Evaluations (F1 Scores) ● AIDA ● 83% on the AIDA-YAGO2 dataset ● 62% on Reuters-21578 ● DBPedia Spotlight ● 81% on a set of 155,000 wikilink samples ● 56% on a set of 35 paragraphs from New York ● 34% on the AIDA/CO-NLL-TestB dataset ● AGDISTIS ● 76% on the AQUAINT dataset ● 60% on the AIDA/CO-NLLTestB dataset ● 31% on the IITB dataset
  • 5. What the evaluations show A NER system’s satisfactory performance in a given scenario does not constitute a trustworthy predictor of its performance in a different scenario.
  • 6. Our goal is to avoid the situation below Our system has normally 85% precision Yes, but for this client, it only achieves 25%!
  • 7. What we have done for that 1. We analyzed the typical way a NER system works and identified (some) potential causes of low NER effectiveness. 2. We defined a set of metrics for determining in more depth the causes for low NER effectiveness in a given scenario. 3. We mapped the potential values of these metrics to actions that may increase NER effectiveness.
  • 8. How a NER system typically works
  • 9. When things may go wrong ● Low Precision: The text does not really contain the system-assigned entities. ● Potential Reasons: ● High ambiguity in the domain and/or texts. ● Evidence applied not enough/appropriate for the texts we have. ● Low Recall: System fails to detect entities in the text that are actually there. ● Potential Reasons: ● The thesaurus is incomplete ● The system requires a certain minimum amount of evidence per entity but cannot find it, either in the background knowledge or in the texts.
  • 10. How do we know what happens in our scenario ● We calculate two set of metrics: ● Ambiguity Metrics: Measure the level of ambiguity in our domain knowledge and texts. ● Evidence Adequacy Metrics: Measure how appropriate the domain knowledge that we apply as evidence is. ● To do that we first perform: ● Manual annotation to a representative sample of the input texts with target and non-target entities from the knowledge graph. ● Automatic annotation to the same texts without any disambiguation (i.e. term matching only!)
  • 11. Ambiguity Types ● Lexical Ambiguity of entity names, i.e. ambiguity between the target entities and common non-entity terms ● E.g., “Page” – the piece of paper or a person? ● Target Entity Ambiguity, i.e., ambiguity between the target entities ● E.g, “Tripoli” – the one in Greece or the one in Libya? ● Knowledge Graph Ambiguity, i.e., ambiguity between the target entities and other entities in the ontology. ● E.g., “Barcelona” - the team or the city? ● Global Ambiguity, i.e. ambiguity between the target entities and entities from other domains, not covered by our knowledge graph. ● E.g., “Orange” - the company or the fruit?
  • 12. Ambiguity Metrics ● Lexical Ambiguity: The percentage of terms which: ● Are common lexical terms rather than entities in the text. ● Have been wrongly mapped by the system to one or more target entities. ● Target Entity Ambiguity: The percentage of terms which: ● Correspond to a target entity. ● Have been mapped by the system to this target entity but also to other target entities. ● Global Ambiguity: The percentage of terms which ● Are not common lexical terms but actual entities in the texts. ● Do not correspond to any knowledge graph entity. ● Have been wrongly mapped by the system to one or more target entities.
  • 13. Ambiguity Metrics ● Knowledge Graph Ambiguity: Two metrics: ● KGA1: The percentage of terms which: ● Correspond to a target entity ● Have been mapped by the system to this entity but also to other non-target entities. ● KGA2: The percentage of terms which: ● Correspond to a non-target entity ● Have been mapped by the system to this entity but also to other target entities. ● KGA1 shows how noisy our Knowledge Graph is with respect to the texts! ● KGA2 shows how noisy our texts are with respect to the Knowledge Graph!
  • 14. Evidence Adequacy Metrics ● Knowledge Graph Richness ● Percentage of target entities with no related entities in the graph. ● Average number of entities a target entity is related to (overall and per relation) in the graph. ● Knowledge Graph Prevalence in Texts ● Percentage of target entities for which there is at least one evidential entity in the texts (overall and per relation). ● Average number of evidential entities a target entity is related to in the texts (overall and per relation).
  • 15. Interpreting and acting on the metrics Metric Values Diagnosis Action • High Lexical Ambiguity The NER system cannot perform well enough Word Sense Disambiguation Improve the linguistic analysis of the NER system. • High Global Ambiguity Many of the input texts are not really related to the domain of the target entities Use a domain/topic classifier to filter out the non-relevant texts and apply the NER process only to the relevant ones. • High KGA1 • Low KGA2 The evidence knowledge graph contains several non-target entities that hamper the disambiguation process rather than helping it. Prune the evidence knowledge graph and keep the most prevalent entities.
  • 16. Interpreting and acting on the metrics Metric Values Diagnosis Action • Low Knowledge Graph Richness Knowledge Graph is not adequate as disambiguation evidence. Enrich the knowledge graph starting from the most prevalent relations • High Knowledge Graph Richness • Low Text Prevalence Knowledge Graph is not adequate as disambiguation evidence. Change or expand the knowledge graph with entities that are more likely to appear in the texts. • Low Knowledge Graph Text Prevalence • Low Target Entity Ambiguity • Low Knowledge Graph Ambiguity The system’s minimum evidence threshold is too high Decrease the threshold.
  • 17. Framework Application Cases ● Target Texts: Short textual descriptions of video scenes from football matches. ● Target Entities: Football players and teams. ● NER System: Knowledge Tagger (in-house) ● Knowledge Graph: DBPedia ● (Initial) NER Effectiveness: ● P = 60% ● R = 55% Case 1: Football ● Target Texts: News articles from JSI’s Newsfeed ● Target Entities: Startup companies ● NER System: Knowledge Tagger (in-house) ● Knowledge Graph: custom-built containing info about founders, investors, competitors etc. ● (Initial) NER Effectiveness: ● P = 35% ● R = 50% Case 2: Startups
  • 18. Ambiguity Metrics Metric / Case Football Startups Lexical Ambiguity 1% 10% Target Entity Ambiguity 30% 4% KGA1 56% 4% KGA2 4% 3% Global Ambiguity 2% 40%
  • 19. Evidence Adequacy Metrics - Knowledge Graph Prevalence Relation Prevalenc e Players and their current club 85% Players and their current co-players 95% Players and their current managers 75% Players and their nationality 10% Players and their place of birth 2% Football Startups Relation Prevalence Companies and their business areas 50% Companies and their founders 40% Companies and their competitors 35% Companies and their CEO 20% Companies and their investors 15%
  • 20. ● Metric Values: ● Considerable Lexical Ambiguity ● High Global Ambiguity ● Mediocre Evidence Adequacy ● Actions: ● Applied heuristic rules for company names detection. ● Applied a classifier to filter out irrelevant news articles ● Reduced the evidence threshold ● Achieved NER Effectiveness: ● P = 78% ● R = 62% From metrics to actions ● Metric Values: ● High KGA1 ● Low KGA2 ● Good Evidence Adequacy ● Actions: ● We pruned the knowledge graph by removing non- football related entities and keeping only the 3 most prevalent relations. ● Achieved NER Effectiveness: ● P = 82% ● R = 80% Football Startups
  • 21. Wrapping Up ● Key Points: ● Our NER diagnostics framework is informal and crude but has proved very helpful in optimizing our client’s NER deployments. ● The main lesson we’ve learned is that it’s very hard to build one NER solution for all possible scenarios; therefore NER systems must be easily and intuitively customizable. ● Future Agenda: ● Implement a comprehensive and intuitive visualization of the metrics. ● Define metrics for measuring the evidential adequacy of textual knowledge resources. ● Automate the interpretation of metric values by means of formal rules.
  • 22. Thank you for your attention! Dr. Panos Alexopoulos Semantic Applications Research Manager Email:  palexopoulos@expertsystem.com Web: www.panosalexopoulos.com LinkedIn: www.linkedin.com/in/panosalexopoulos Twitter: @PAlexop