SlideShare a Scribd company logo
SANAPHOR: Ontology-Based Coreference
Resolution
Roman Prokofyev, Alberto Tonon, Michael Luggen, Loic Vouilloz,
Djellel Difallah and Philippe Cudré-Mauroux
eXascale Infolab
University of Fribourg, Switzerland
October 14th, ISWC’15
Bethlehem PA, USA
1
Motivations and Task Overview
2
Task: identify groups (cluster) of co-referring mentions.
Example:
“Xi Jinping was due to arrive in Washington for a dinner with
Barack Obama on Thursday night, in which he will aim to
reassure the US president about a rising China. The Chinese
president said he favors a “new model of major country
relationship" built on understanding, rather than suspicion.”
http://www.telegraph.co.uk/
Benefits:
• identification of a specific type of an unknown entity
• extract more relationships between named entities
State-of-Art in Coreference Resolution
Best approaches use generic multi-step algorithm:
1. Pre-processing (POS tagging, parsing, NER)
2. Identification of referring expressions (e.g., pronouns)
3. Anaphoricity determination (“it rains” vs “he took it”)
4. Generation of antecedent candidates
5. Searching/Clustering of candidates
Lee et al., Stanford’s multi-pass sieve coreference resolution system at the
conll-2011 shared task
3
Motivations for a rich semantic layer
4
http://www.telegraph.co.uk/
“Xi Jinping was due to arrive in Washington for a dinner
with Barack Obama on Thursday night, in which he will
aim to reassure the US president about a rising China.
The Chinese president said he favors a “new model of
major country relationship" built on understanding, rather
than suspicion.”
Syntactic approaches are not able to differentiate between
the names of the city and the province.
Semantic layer on top of an existing system
5
Stanford Coref
Deterministic Coreference
Resolution
[US President] [Barack Obama]
[Australia] [Quintex Australia]
[Quintex ltd.]
Documents
Generic overview of the approach
Key techniques
Split and merge clusters based on their semantics.
6
Clusters produced
by Stanford Coref
Entity/Type
Linking
Split
clusters
Merge
clusters
SANAPHOR
Pre-Processing: Entity Linking
7
Entity Linking
US President Barack Obama
Australia
Quintex Australia
Quintex ltd.
US President e1: Barack Obama
e2: Australia
e3: Quintex Australia
e3: Quintex ltd.
Pre-Processing: Semantic Typing
8
Semantic Typing:
recognized entities are
typed, other mention are
typed by string similarity
with YAGO.
YAGO Index
US President e1: Barack Obama
t1: US President e1: Barack Obama
Cluster splits
9
Entity- and Type-based splitting on clusters
(e2: Australia) (e3: Quintex Australia) (e3: Quintex ltd.)
e3: Quintex Australia
e3: Quintex ltd.
e2: Australia
Cluster splits: heuristics
10
1. Non-identified mention assignment – based on
exclusive words in each cluster:
Obama ⇒ Barack Obama
Jinping ⇒ Xi Jinping
2. Ignore complete subsets of other identified
mentions:
✕ Aspen (“Aspen Airways”)
✕ Obama (“Barack Obama”)
Cluster merges
11
Merge different clusters that
contain the same types/entities
t1: US President e1: Barack Obama
(e1: Barack Obama) (t1:US President)
Evaluation
CoNLL-2012 Shared Task on Coference Resolution:
• over 1M words
• 3 parts: development, training and test.
Design methods based on dev, evaluate on test.
Metrics:
• Precision/Recall/F1 for the case of clustering
• Evaluate noun-only clusters separately (no pronouns)
12
Cluster linking statistics
13
0 entities 1 entity 2 entities 3 entities
All Clusters 4175 849 49 5
Noun-Only Clusters 1208 502 33 2
Total clusters (Stanford Coref): 5078
To be merged To be split
All Clusters 270 118
Noun-Only Clusters 77 52
Cluster optimization results
14
• System improves on top of Stanford Coref in both split
and merge tasks.
• Greater improvement in split task for noun-only clusters,
since we do not re-assign pronouns.
Conclusions
• Leveraging semantic information improves coreference
resolution on top of existing NLP systems.
• The performance improves with the improvement of entity
and type linking.
• Complete evaluation code available at:
https://github.com/xi-lab/sanaphor
15
Roman Prokofyev (@rprokofyev)
eXascale Infolab (exascale.info), University of Fribourg, Switzerland
http://www.slideshare.net/eXascaleInfolab/
Anaphora vs. Coreference
“Do you have a cat? I love them.”
“a cat” is not an antecedent of “them”.
16
Metrics
• True positive (TP) - two similar documents to the same
cluster.
• True negative (TN) - two dissimilar documents to different
clusters.
• False positive (FP) - two dissimilar documents to the
same cluster.
• False negative (FN) - two similar documents to different
clusters.
17

More Related Content

Similar to SANAPHOR: Ontology-based Coreference Resolution

NAISTビッグデータシンポジウム - 情報 松本先生
NAISTビッグデータシンポジウム - 情報 松本先生NAISTビッグデータシンポジウム - 情報 松本先生
NAISTビッグデータシンポジウム - 情報 松本先生
ysuzuki-naist
 
Modeling missing data in distant supervision for information extraction (Ritt...
Modeling missing data in distant supervision for information extraction (Ritt...Modeling missing data in distant supervision for information extraction (Ritt...
Modeling missing data in distant supervision for information extraction (Ritt...
Naoaki Okazaki
 
ppt
pptppt
ppt
butest
 
Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...
BaoTramDuong2
 
2011linked science4mccuskermcguinnessfinal
2011linked science4mccuskermcguinnessfinal2011linked science4mccuskermcguinnessfinal
2011linked science4mccuskermcguinnessfinal
Deborah McGuinness
 
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
Grammarly
 
Text analysis-semantic-search
Text analysis-semantic-searchText analysis-semantic-search
Text analysis-semantic-search
Diana Maynard
 
Text Analytics - JCC2014 Kimelfeld
Text Analytics - JCC2014 KimelfeldText Analytics - JCC2014 Kimelfeld
Text Analytics - JCC2014 Kimelfeld
Pedro Contreras Flores
 
Natural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptxNatural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptx
AlyaaMachi
 
Dynamic Search Using Semantics & Statistics
Dynamic Search Using Semantics & StatisticsDynamic Search Using Semantics & Statistics
Dynamic Search Using Semantics & Statistics
Paul Hofmann
 
ICCSS2015 talk: Null model for meme popularity
ICCSS2015 talk: Null model for meme popularityICCSS2015 talk: Null model for meme popularity
ICCSS2015 talk: Null model for meme popularity
James Gleeson
 
B017441015
B017441015B017441015
B017441015
IOSR Journals
 
NE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISNE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSIS
rathnaarul
 
Domain Specific Named Entity Recognition Using Supervised Approach
Domain Specific Named Entity Recognition Using Supervised ApproachDomain Specific Named Entity Recognition Using Supervised Approach
Domain Specific Named Entity Recognition Using Supervised Approach
Waqas Tariq
 
网络环境下的大规模内容计算 --
 Web Search and Web Mining
网络环境下的大规模内容计算 --
	Web Search and Web Mining网络环境下的大规模内容计算 --
	Web Search and Web Mining
网络环境下的大规模内容计算 --
 Web Search and Web Mining
George Ang
 
D017422528
D017422528D017422528
D017422528
IOSR Journals
 
Building AI Applications using Knowledge Graphs
Building AI Applications using Knowledge GraphsBuilding AI Applications using Knowledge Graphs
Building AI Applications using Knowledge Graphs
Andre Freitas
 
Robust Coreference Resolution and Entity Linking on Dialogues: Character Iden...
Robust Coreference Resolution and Entity Linking on Dialogues: Character Iden...Robust Coreference Resolution and Entity Linking on Dialogues: Character Iden...
Robust Coreference Resolution and Entity Linking on Dialogues: Character Iden...
Jinho Choi
 
SMART Seminar Series: "Data is the new water in the digital age"
SMART Seminar Series: "Data is the new water in the digital age"SMART Seminar Series: "Data is the new water in the digital age"
SMART Seminar Series: "Data is the new water in the digital age"
SMART Infrastructure Facility
 
Complex Relations Extraction
Complex Relations ExtractionComplex Relations Extraction
Complex Relations Extraction
Naveed Afzal
 

Similar to SANAPHOR: Ontology-based Coreference Resolution (20)

NAISTビッグデータシンポジウム - 情報 松本先生
NAISTビッグデータシンポジウム - 情報 松本先生NAISTビッグデータシンポジウム - 情報 松本先生
NAISTビッグデータシンポジウム - 情報 松本先生
 
Modeling missing data in distant supervision for information extraction (Ritt...
Modeling missing data in distant supervision for information extraction (Ritt...Modeling missing data in distant supervision for information extraction (Ritt...
Modeling missing data in distant supervision for information extraction (Ritt...
 
ppt
pptppt
ppt
 
Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...
 
2011linked science4mccuskermcguinnessfinal
2011linked science4mccuskermcguinnessfinal2011linked science4mccuskermcguinnessfinal
2011linked science4mccuskermcguinnessfinal
 
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
 
Text analysis-semantic-search
Text analysis-semantic-searchText analysis-semantic-search
Text analysis-semantic-search
 
Text Analytics - JCC2014 Kimelfeld
Text Analytics - JCC2014 KimelfeldText Analytics - JCC2014 Kimelfeld
Text Analytics - JCC2014 Kimelfeld
 
Natural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptxNatural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptx
 
Dynamic Search Using Semantics & Statistics
Dynamic Search Using Semantics & StatisticsDynamic Search Using Semantics & Statistics
Dynamic Search Using Semantics & Statistics
 
ICCSS2015 talk: Null model for meme popularity
ICCSS2015 talk: Null model for meme popularityICCSS2015 talk: Null model for meme popularity
ICCSS2015 talk: Null model for meme popularity
 
B017441015
B017441015B017441015
B017441015
 
NE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISNE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSIS
 
Domain Specific Named Entity Recognition Using Supervised Approach
Domain Specific Named Entity Recognition Using Supervised ApproachDomain Specific Named Entity Recognition Using Supervised Approach
Domain Specific Named Entity Recognition Using Supervised Approach
 
网络环境下的大规模内容计算 --
 Web Search and Web Mining
网络环境下的大规模内容计算 --
	Web Search and Web Mining网络环境下的大规模内容计算 --
	Web Search and Web Mining
网络环境下的大规模内容计算 --
 Web Search and Web Mining
 
D017422528
D017422528D017422528
D017422528
 
Building AI Applications using Knowledge Graphs
Building AI Applications using Knowledge GraphsBuilding AI Applications using Knowledge Graphs
Building AI Applications using Knowledge Graphs
 
Robust Coreference Resolution and Entity Linking on Dialogues: Character Iden...
Robust Coreference Resolution and Entity Linking on Dialogues: Character Iden...Robust Coreference Resolution and Entity Linking on Dialogues: Character Iden...
Robust Coreference Resolution and Entity Linking on Dialogues: Character Iden...
 
SMART Seminar Series: "Data is the new water in the digital age"
SMART Seminar Series: "Data is the new water in the digital age"SMART Seminar Series: "Data is the new water in the digital age"
SMART Seminar Series: "Data is the new water in the digital age"
 
Complex Relations Extraction
Complex Relations ExtractionComplex Relations Extraction
Complex Relations Extraction
 

More from eXascale Infolab

Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link PredictionBeyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
eXascale Infolab
 
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
eXascale Infolab
 
Representation Learning on Complex Graphs
Representation Learning on Complex GraphsRepresentation Learning on Complex Graphs
Representation Learning on Complex Graphs
eXascale Infolab
 
A force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory mapA force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory map
eXascale Infolab
 
Cikm 2018
Cikm 2018Cikm 2018
Cikm 2018
eXascale Infolab
 
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
eXascale Infolab
 
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
eXascale Infolab
 
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data OceansDependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
eXascale Infolab
 
Crowd scheduling www2016
Crowd scheduling www2016Crowd scheduling www2016
Crowd scheduling www2016
eXascale Infolab
 
Efficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataEfficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked Data
eXascale Infolab
 
Entity-Centric Data Management
Entity-Centric Data ManagementEntity-Centric Data Management
Entity-Centric Data Management
eXascale Infolab
 
SSSW 2015 Sense Making
SSSW 2015 Sense MakingSSSW 2015 Sense Making
SSSW 2015 Sense Making
eXascale Infolab
 
Executing Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web DataExecuting Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web Data
eXascale Infolab
 
The Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task CrowdsourcingThe Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task Crowdsourcing
eXascale Infolab
 
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
eXascale Infolab
 
CIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingCIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition ranking
eXascale Infolab
 
OLTP-Bench
OLTP-BenchOLTP-Bench
OLTP-Bench
eXascale Infolab
 
An Introduction to Big Data
An Introduction to Big DataAn Introduction to Big Data
An Introduction to Big Data
eXascale Infolab
 
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
eXascale Infolab
 
Hasler2014
Hasler2014Hasler2014
Hasler2014
eXascale Infolab
 

More from eXascale Infolab (20)

Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link PredictionBeyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
 
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
 
Representation Learning on Complex Graphs
Representation Learning on Complex GraphsRepresentation Learning on Complex Graphs
Representation Learning on Complex Graphs
 
A force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory mapA force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory map
 
Cikm 2018
Cikm 2018Cikm 2018
Cikm 2018
 
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
 
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
 
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data OceansDependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
 
Crowd scheduling www2016
Crowd scheduling www2016Crowd scheduling www2016
Crowd scheduling www2016
 
Efficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataEfficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked Data
 
Entity-Centric Data Management
Entity-Centric Data ManagementEntity-Centric Data Management
Entity-Centric Data Management
 
SSSW 2015 Sense Making
SSSW 2015 Sense MakingSSSW 2015 Sense Making
SSSW 2015 Sense Making
 
Executing Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web DataExecuting Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web Data
 
The Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task CrowdsourcingThe Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task Crowdsourcing
 
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
 
CIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingCIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition ranking
 
OLTP-Bench
OLTP-BenchOLTP-Bench
OLTP-Bench
 
An Introduction to Big Data
An Introduction to Big DataAn Introduction to Big Data
An Introduction to Big Data
 
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
 
Hasler2014
Hasler2014Hasler2014
Hasler2014
 

Recently uploaded

Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Neo4j
 
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
FilipTomaszewski5
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
ScyllaDB
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
Sease
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
christinelarrosa
 
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptxAI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
Sunil Jagani
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
UiPathCommunity
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
BibashShahi
 
Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!
Ortus Solutions, Corp
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
Pablo Gómez Abajo
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
christinelarrosa
 
AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)
HarpalGohil4
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
UiPathCommunity
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
ScyllaDB
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving
 

Recently uploaded (20)

Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
 
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
 
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptxAI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
 
Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
 
AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
 

SANAPHOR: Ontology-based Coreference Resolution

  • 1. SANAPHOR: Ontology-Based Coreference Resolution Roman Prokofyev, Alberto Tonon, Michael Luggen, Loic Vouilloz, Djellel Difallah and Philippe Cudré-Mauroux eXascale Infolab University of Fribourg, Switzerland October 14th, ISWC’15 Bethlehem PA, USA 1
  • 2. Motivations and Task Overview 2 Task: identify groups (cluster) of co-referring mentions. Example: “Xi Jinping was due to arrive in Washington for a dinner with Barack Obama on Thursday night, in which he will aim to reassure the US president about a rising China. The Chinese president said he favors a “new model of major country relationship" built on understanding, rather than suspicion.” http://www.telegraph.co.uk/ Benefits: • identification of a specific type of an unknown entity • extract more relationships between named entities
  • 3. State-of-Art in Coreference Resolution Best approaches use generic multi-step algorithm: 1. Pre-processing (POS tagging, parsing, NER) 2. Identification of referring expressions (e.g., pronouns) 3. Anaphoricity determination (“it rains” vs “he took it”) 4. Generation of antecedent candidates 5. Searching/Clustering of candidates Lee et al., Stanford’s multi-pass sieve coreference resolution system at the conll-2011 shared task 3
  • 4. Motivations for a rich semantic layer 4 http://www.telegraph.co.uk/ “Xi Jinping was due to arrive in Washington for a dinner with Barack Obama on Thursday night, in which he will aim to reassure the US president about a rising China. The Chinese president said he favors a “new model of major country relationship" built on understanding, rather than suspicion.” Syntactic approaches are not able to differentiate between the names of the city and the province.
  • 5. Semantic layer on top of an existing system 5 Stanford Coref Deterministic Coreference Resolution [US President] [Barack Obama] [Australia] [Quintex Australia] [Quintex ltd.] Documents
  • 6. Generic overview of the approach Key techniques Split and merge clusters based on their semantics. 6 Clusters produced by Stanford Coref Entity/Type Linking Split clusters Merge clusters SANAPHOR
  • 7. Pre-Processing: Entity Linking 7 Entity Linking US President Barack Obama Australia Quintex Australia Quintex ltd. US President e1: Barack Obama e2: Australia e3: Quintex Australia e3: Quintex ltd.
  • 8. Pre-Processing: Semantic Typing 8 Semantic Typing: recognized entities are typed, other mention are typed by string similarity with YAGO. YAGO Index US President e1: Barack Obama t1: US President e1: Barack Obama
  • 9. Cluster splits 9 Entity- and Type-based splitting on clusters (e2: Australia) (e3: Quintex Australia) (e3: Quintex ltd.) e3: Quintex Australia e3: Quintex ltd. e2: Australia
  • 10. Cluster splits: heuristics 10 1. Non-identified mention assignment – based on exclusive words in each cluster: Obama ⇒ Barack Obama Jinping ⇒ Xi Jinping 2. Ignore complete subsets of other identified mentions: ✕ Aspen (“Aspen Airways”) ✕ Obama (“Barack Obama”)
  • 11. Cluster merges 11 Merge different clusters that contain the same types/entities t1: US President e1: Barack Obama (e1: Barack Obama) (t1:US President)
  • 12. Evaluation CoNLL-2012 Shared Task on Coference Resolution: • over 1M words • 3 parts: development, training and test. Design methods based on dev, evaluate on test. Metrics: • Precision/Recall/F1 for the case of clustering • Evaluate noun-only clusters separately (no pronouns) 12
  • 13. Cluster linking statistics 13 0 entities 1 entity 2 entities 3 entities All Clusters 4175 849 49 5 Noun-Only Clusters 1208 502 33 2 Total clusters (Stanford Coref): 5078 To be merged To be split All Clusters 270 118 Noun-Only Clusters 77 52
  • 14. Cluster optimization results 14 • System improves on top of Stanford Coref in both split and merge tasks. • Greater improvement in split task for noun-only clusters, since we do not re-assign pronouns.
  • 15. Conclusions • Leveraging semantic information improves coreference resolution on top of existing NLP systems. • The performance improves with the improvement of entity and type linking. • Complete evaluation code available at: https://github.com/xi-lab/sanaphor 15 Roman Prokofyev (@rprokofyev) eXascale Infolab (exascale.info), University of Fribourg, Switzerland http://www.slideshare.net/eXascaleInfolab/
  • 16. Anaphora vs. Coreference “Do you have a cat? I love them.” “a cat” is not an antecedent of “them”. 16
  • 17. Metrics • True positive (TP) - two similar documents to the same cluster. • True negative (TN) - two dissimilar documents to different clusters. • False positive (FP) - two dissimilar documents to the same cluster. • False negative (FN) - two similar documents to different clusters. 17

Editor's Notes

  1. Welcome everyone, my name is Roman Prokofyev, and I’m a PhD student at the eXascale Infolab, at the University of Fribourg, Switzerland. And today I will present you our joint work on ontology-based conreference resolution.
  2. I’ll start with a overview of the task we are solving here.
  3. So, currently, the standard way to resolve coreferences is by means of a multi-step approach which was developed over the years.
  4. However, NLP-based approach fails to determine correct coreference cluster when the referring phrases are somewhat ambiguous.
  5. In our work we introduce a semantic layer on top of an existing system will allow us to rearrange coreference clusters based on their semantics. Stanford produces so-called clusters…
  6. Thus, we have designed the following pipeline for our system. Let’s see how each box operates in detail.
  7. First step of our pipeline, … spotlight – decent technology
  8. beyond EL, semantic typing, the next pre-processing step is semantic typing…
  9. Now, after we completed the necessary pre-processing steps, we start re-arranging the coreference clusters. The first step is to split semantically unrelated clusters, which means that clusters contain either different entities or types from different branches of hierarchy.
  10. We identified the following problems
  11. Second step is cluster merging, that is, cluster that either contain same entities, or exactly same types, or, in case there is a mix of types and entities,…
  12. Ontonotes 5: available on LDC for free, 1M words from newswire, magazine articles, web data
  13. First, we evaluate the quality of our entity linking step
  14. we notice that the absolute increase in F1 score for the split task is greater for the Noun-Only case (+10.54% vs +2.94%). This results from the fact that All Clusters also contain non-noun mentions, such as pronouns, which we don’t directly tackle in this work but have to be assigned to one of the splits nevertheless. Our approach in that context is to keep the non-noun mentions with the first noun-mention in the cluster, which seems to be suboptimal for this case. For the merge task, the difference between All and Noun-Only clusters is much smaller (+27.03% for the All Clusters vs +18.96% for the Noun-Only case). In this case, non-noun words do not have any effect, since we merge clusters and also include all other mentions.