SlideShare a Scribd company logo
1 of 25
Download to read offline
Alberto Parravicini
Rhicheek Patra
Davide Bartolini
Marco Santambrogio
2019-05-{17-31}, NGCX
Fast Entity
Linking via Graph
Embeddings
Alberto Parravicini 23/05/2019
Entity Linking (EL): connecting words of
interest to unique identities (e.g. Wikipedia
Page)
Entity Linking
2
Alberto Parravicini 23/05/2019
Component of applications that require
high-level representations of text:
1. Search Engines, for semantic search
2. Recommender Systems, to retrieve documents
similar to each other
3. Chat bots, to understand intents and entities
Use Cases
3
Alberto Parravicini 23/05/2019
An EL system requires 2 steps:
1. Named Entity Recognition (NER): spot
mentions (a.k.a. Named Entities)
● High-accuracy in the state-of-the-art[1]
The EL Pipeline (1/2)
[1] Huang, Zhiheng, Wei Xu, and Kai Yu. "Bidirectional LSTM-CRF models for sequence tagging."
4
Alberto Parravicini 23/05/2019
An EL system requires 2 steps:
2. Entity Linking: connect mentions to entities
The EL Pipeline (2/2)
5
Alberto Parravicini 23/05/2019
An EL system requires 2 steps:
2. Entity Linking: connect mentions to entities
The EL Pipeline (2/2)
6
Alberto Parravicini 23/05/2019
An EL system requires 2 steps:
2. Entity Linking: connect mentions to entities
The EL Pipeline (2/2)
7
Alberto Parravicini 23/05/2019
● Novel unsupervised framework for EL
● No dependency on NLP
● First EL algorithm to use graph embeddings
● Accuracy similar to supervised SoA techniques
● Highly scalable and real-time execution time
● < 1 sec to process text with 30+ mentions
Our contributions
8
Alberto Parravicini 23/05/2019
Our EL Pipeline
Input text
with NER
Graph Building Embeddings Candidate Finder
Disambiguation
Graph Learning
Execution
Output
1 2 3 4
9
Alberto Parravicini 23/05/2019
We obtain a large graph from DBpedia
● All the information of Wikipedia, stored as triples
● 12M entities, 170M links
Step 1/4
Graph Creation
10
Alberto Parravicini 23/05/2019
From DBPedia, we build two graphs
Redirects GraphProperty Graph
Step 1/4
Graph Creation
11
Alberto Parravicini 23/05/2019
● Graph embeddings encode vertices as vectors
● “Similar” vertices have “similar” embeddings
● Idea: entities with the same context should have
low embedding distance
Embeddings Creation (1/2)
Step 2/4
12
Alberto Parravicini 23/05/2019
● In our work, we use DeepWalk[1]
● Like word2vec[2]
, it leverages random walks (i.e.
vertex sequences) to create embeddings
● Embedding size 170, walk length 8
● DeepWalk uses only the graph topology
● Simple baseline, we can use better algorithms
and leverage graph features
Embeddings Creation (2/2)
[1] Bryan Perozzi et al.2014. Deepwalk
[2] Tomas Mikolov et al. 2013. Distributed
representations of words and phrases and
their compositionality
Step 2/4
13
Alberto Parravicini 23/05/2019
Candidates Finder
Idea: for each mention, select a few candidate
vertices with index- based string similarity
● Solve ambiguity following redirect and
disambiguation edges
Step 3/4
14
Alberto Parravicini 23/05/2019
Candidates Finder
Step 3/4
Idea: for each mention, select a few candidate
vertices with index- based string similarity
● Solve ambiguity following redirect and
disambiguation edges
15
Alberto Parravicini 23/05/2019
Disambiguation (1/3)
●We want to pick the “best” candidate for
each mention
● In a “good” solution, candidates are related to
each other (e.g. Donald Trump, Hillary Clinton)
●Observation: a good tuple of candidates has
embeddings close to each other
●Evaluating all combinations
is infeasible
● 10 mentions with 100 candidates 10010
Step 4/4
16
Alberto Parravicini 23/05/2019
17
●We use an heuristic state-space search
algorithm to maximize:
Sum of string
similarities
Step 4/4
Sum of embedding
cosine similarities
w.r.t. tuple mean
Disambiguation (2/3)
Alberto Parravicini 23/05/2019
● Iterative state-space heuristic
Greedy
iterative
procedure
Step 4/4
18
Disambiguation (3/3)
Alberto Parravicini 23/05/2019
● We compared against 6 SoA EL algorithms, on 5
datasets
● Our Micro-averaged F1 score is comparable with
SoA supervised algorithms
Results: accuracy
19
Alberto Parravicini 23/05/2019
● Different settings enable real-time EL, with
minimal loss in accuracy
● E.g. number of iterations, early stop
Results: exec. time
20
Alberto Parravicini 23/05/2019
● Execution time is well divided between
Candidate Finder and Disambiguation
Results: exec. time of
single steps
21
Thank you!
Fast Entity Linking via Graph Embeddings
● Novel unsupervised framework for EL
● First EL algorithm to use graph embeddings
○ Accuracy similar to supervised SoA techniques
● Real-time execution time
Alberto Parravicini, alberto.parravicini@polimi.it
Rhicheek Patra
Davide Bartolini
Marco Santambrogio
2019-05-{17-31}, NGCX
Alberto Parravicini 23/05/2019
●Turn topology and properties of each vertex
into a vector
●“Similar” vertices have “similar” embeddings
Embeddings
Step 2/4
embedding
properties
topology
Hamilton, William L., Rex Ying, and Jure Leskovec.
"Representation learning on graphs: Methods and
applications." arXiv preprint arXiv:1709.05584 (2017).
Alberto Parravicini 23/05/2019
… and join them together
Step 1/4
Graph Creation
Alberto Parravicini 23/05/2019
Candidates Finder
Idea: for each mention, select a small number of
candidate vertices with string similarity
● We use a simple index-based string search
● Fuzzy matching with 2-grams and 3-grams
● This provide a simple baseline (60-70%
accuracy)
Step 3/4

More Related Content

What's hot

Configuration of classes
Configuration of classesConfiguration of classes
Configuration of classesanumshakeel5
 
Media4Math's Algebra Series
Media4Math's Algebra SeriesMedia4Math's Algebra Series
Media4Math's Algebra SeriesMedia4math
 
Information Flow based Ontology Mapping - 2002
Information Flow based Ontology Mapping - 2002Information Flow based Ontology Mapping - 2002
Information Flow based Ontology Mapping - 2002Yannis Kalfoglou
 
A Data Science Tutorial in Python
A Data Science Tutorial in PythonA Data Science Tutorial in Python
A Data Science Tutorial in PythonAjay Ohri
 
Knowledge Graphs and Milestone
Knowledge Graphs and MilestoneKnowledge Graphs and Milestone
Knowledge Graphs and MilestoneBarry Norton
 
BigInt: Integers as big as you want in JavaScript (Web Engines Hackfest 2017)
BigInt: Integers as big as you want in JavaScript (Web Engines Hackfest 2017)BigInt: Integers as big as you want in JavaScript (Web Engines Hackfest 2017)
BigInt: Integers as big as you want in JavaScript (Web Engines Hackfest 2017)Igalia
 
Fast Python: Master the Basics to Write Faster Code
Fast Python: Master the Basics to Write Faster CodeFast Python: Master the Basics to Write Faster Code
Fast Python: Master the Basics to Write Faster CodeChristopher Conlan
 
Integrate with Tracing and Logging
Integrate with Tracing and LoggingIntegrate with Tracing and Logging
Integrate with Tracing and LoggingAya Ozawa (Igarashi)
 
Unit 1 polynomial manipulation
Unit 1   polynomial manipulationUnit 1   polynomial manipulation
Unit 1 polynomial manipulationLavanyaJ28
 

What's hot (13)

Configuration of classes
Configuration of classesConfiguration of classes
Configuration of classes
 
Bio 180208
Bio 180208Bio 180208
Bio 180208
 
Rust presentation convergeconf
Rust presentation convergeconfRust presentation convergeconf
Rust presentation convergeconf
 
Media4Math's Algebra Series
Media4Math's Algebra SeriesMedia4Math's Algebra Series
Media4Math's Algebra Series
 
Information Flow based Ontology Mapping - 2002
Information Flow based Ontology Mapping - 2002Information Flow based Ontology Mapping - 2002
Information Flow based Ontology Mapping - 2002
 
A Data Science Tutorial in Python
A Data Science Tutorial in PythonA Data Science Tutorial in Python
A Data Science Tutorial in Python
 
Knowledge Graphs and Milestone
Knowledge Graphs and MilestoneKnowledge Graphs and Milestone
Knowledge Graphs and Milestone
 
BigInt: Integers as big as you want in JavaScript (Web Engines Hackfest 2017)
BigInt: Integers as big as you want in JavaScript (Web Engines Hackfest 2017)BigInt: Integers as big as you want in JavaScript (Web Engines Hackfest 2017)
BigInt: Integers as big as you want in JavaScript (Web Engines Hackfest 2017)
 
Fast Python: Master the Basics to Write Faster Code
Fast Python: Master the Basics to Write Faster CodeFast Python: Master the Basics to Write Faster Code
Fast Python: Master the Basics to Write Faster Code
 
brf to mathml
brf to mathmlbrf to mathml
brf to mathml
 
Intro to Elixir
Intro to ElixirIntro to Elixir
Intro to Elixir
 
Integrate with Tracing and Logging
Integrate with Tracing and LoggingIntegrate with Tracing and Logging
Integrate with Tracing and Logging
 
Unit 1 polynomial manipulation
Unit 1   polynomial manipulationUnit 1   polynomial manipulation
Unit 1 polynomial manipulation
 

Similar to Fast Entity Linking via Graph Embeddings

Gospel - High-performance heterogeneous architectures for graph analytics
 Gospel - High-performance heterogeneous architectures for graph analytics Gospel - High-performance heterogeneous architectures for graph analytics
Gospel - High-performance heterogeneous architectures for graph analyticsNECST Lab @ Politecnico di Milano
 
IRJET- Python Based Machine Learning for Profile Matching
IRJET- 	  Python Based Machine Learning for Profile MatchingIRJET- 	  Python Based Machine Learning for Profile Matching
IRJET- Python Based Machine Learning for Profile MatchingIRJET Journal
 
Pattern-based Definition and Generation of Components for a Synchronous React...
Pattern-based Definition and Generation of Components for a Synchronous React...Pattern-based Definition and Generation of Components for a Synchronous React...
Pattern-based Definition and Generation of Components for a Synchronous React...ijeukens
 
FEATURES MATCHING USING NATURAL LANGUAGE PROCESSING
FEATURES MATCHING USING NATURAL LANGUAGE PROCESSINGFEATURES MATCHING USING NATURAL LANGUAGE PROCESSING
FEATURES MATCHING USING NATURAL LANGUAGE PROCESSINGIJCI JOURNAL
 
Colombo+ronzoni+fontana
Colombo+ronzoni+fontanaColombo+ronzoni+fontana
Colombo+ronzoni+fontanaAjay Ohri
 
Master in Big Data Analytics and Social Mining 20015
Master in Big Data Analytics and Social Mining 20015Master in Big Data Analytics and Social Mining 20015
Master in Big Data Analytics and Social Mining 20015Andrea Gigli
 
Vol 16 No 2 - July-December 2016
Vol 16 No 2 - July-December 2016Vol 16 No 2 - July-December 2016
Vol 16 No 2 - July-December 2016ijcsbi
 
cis5-204-Project-ch11c - Evan, Le, Mata.pdf
cis5-204-Project-ch11c - Evan, Le, Mata.pdfcis5-204-Project-ch11c - Evan, Le, Mata.pdf
cis5-204-Project-ch11c - Evan, Le, Mata.pdfMinhLe595264
 
CML's Presentation at FengChia University
CML's Presentation at FengChia UniversityCML's Presentation at FengChia University
CML's Presentation at FengChia UniversityTunghai University
 
IRJET- Semantics based Document Clustering
IRJET- Semantics based Document ClusteringIRJET- Semantics based Document Clustering
IRJET- Semantics based Document ClusteringIRJET Journal
 
Partial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather ConditionsPartial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather ConditionsIRJET Journal
 
Locloud - D2.6: Crawler ready tagging tools
Locloud - D2.6: Crawler ready tagging toolsLocloud - D2.6: Crawler ready tagging tools
Locloud - D2.6: Crawler ready tagging toolslocloud
 
IRJET - Event Notifier on Scraped Mails using NLP
IRJET - Event Notifier on Scraped Mails using NLPIRJET - Event Notifier on Scraped Mails using NLP
IRJET - Event Notifier on Scraped Mails using NLPIRJET Journal
 
Towards a Resource Slice Interoperability Hub for IoT
Towards a Resource Slice Interoperability Hub for IoTTowards a Resource Slice Interoperability Hub for IoT
Towards a Resource Slice Interoperability Hub for IoTHong-Linh Truong
 
IRJET- Hosting NLP based Chatbot on AWS Cloud using Docker
IRJET-  	  Hosting NLP based Chatbot on AWS Cloud using DockerIRJET-  	  Hosting NLP based Chatbot on AWS Cloud using Docker
IRJET- Hosting NLP based Chatbot on AWS Cloud using DockerIRJET Journal
 
Comparing the performance of a business process: using Excel & Python
Comparing the performance of a business process: using Excel & PythonComparing the performance of a business process: using Excel & Python
Comparing the performance of a business process: using Excel & PythonIRJET Journal
 
IRJET - Speech to Speech Translation using Encoder Decoder Architecture
IRJET -  	  Speech to Speech Translation using Encoder Decoder ArchitectureIRJET -  	  Speech to Speech Translation using Encoder Decoder Architecture
IRJET - Speech to Speech Translation using Encoder Decoder ArchitectureIRJET Journal
 

Similar to Fast Entity Linking via Graph Embeddings (20)

Gospel - High-performance heterogeneous architectures for graph analytics
 Gospel - High-performance heterogeneous architectures for graph analytics Gospel - High-performance heterogeneous architectures for graph analytics
Gospel - High-performance heterogeneous architectures for graph analytics
 
IRJET- Python Based Machine Learning for Profile Matching
IRJET- 	  Python Based Machine Learning for Profile MatchingIRJET- 	  Python Based Machine Learning for Profile Matching
IRJET- Python Based Machine Learning for Profile Matching
 
Pattern-based Definition and Generation of Components for a Synchronous React...
Pattern-based Definition and Generation of Components for a Synchronous React...Pattern-based Definition and Generation of Components for a Synchronous React...
Pattern-based Definition and Generation of Components for a Synchronous React...
 
FEATURES MATCHING USING NATURAL LANGUAGE PROCESSING
FEATURES MATCHING USING NATURAL LANGUAGE PROCESSINGFEATURES MATCHING USING NATURAL LANGUAGE PROCESSING
FEATURES MATCHING USING NATURAL LANGUAGE PROCESSING
 
Colombo+ronzoni+fontana
Colombo+ronzoni+fontanaColombo+ronzoni+fontana
Colombo+ronzoni+fontana
 
Master in Big Data Analytics and Social Mining 20015
Master in Big Data Analytics and Social Mining 20015Master in Big Data Analytics and Social Mining 20015
Master in Big Data Analytics and Social Mining 20015
 
Vol 16 No 2 - July-December 2016
Vol 16 No 2 - July-December 2016Vol 16 No 2 - July-December 2016
Vol 16 No 2 - July-December 2016
 
CIS 5 Project.pdf
CIS 5 Project.pdfCIS 5 Project.pdf
CIS 5 Project.pdf
 
cis5-204-Project-ch11c - Evan, Le, Mata.pdf
cis5-204-Project-ch11c - Evan, Le, Mata.pdfcis5-204-Project-ch11c - Evan, Le, Mata.pdf
cis5-204-Project-ch11c - Evan, Le, Mata.pdf
 
CML's Presentation at FengChia University
CML's Presentation at FengChia UniversityCML's Presentation at FengChia University
CML's Presentation at FengChia University
 
IRJET- Semantics based Document Clustering
IRJET- Semantics based Document ClusteringIRJET- Semantics based Document Clustering
IRJET- Semantics based Document Clustering
 
Partial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather ConditionsPartial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather Conditions
 
Locloud - D2.6: Crawler ready tagging tools
Locloud - D2.6: Crawler ready tagging toolsLocloud - D2.6: Crawler ready tagging tools
Locloud - D2.6: Crawler ready tagging tools
 
IRJET - Event Notifier on Scraped Mails using NLP
IRJET - Event Notifier on Scraped Mails using NLPIRJET - Event Notifier on Scraped Mails using NLP
IRJET - Event Notifier on Scraped Mails using NLP
 
Towards Edge Computing as a Service: Dynamic Formation of the Micro Data-Centers
Towards Edge Computing as a Service: Dynamic Formation of the Micro Data-CentersTowards Edge Computing as a Service: Dynamic Formation of the Micro Data-Centers
Towards Edge Computing as a Service: Dynamic Formation of the Micro Data-Centers
 
Towards a Resource Slice Interoperability Hub for IoT
Towards a Resource Slice Interoperability Hub for IoTTowards a Resource Slice Interoperability Hub for IoT
Towards a Resource Slice Interoperability Hub for IoT
 
IRJET- Hosting NLP based Chatbot on AWS Cloud using Docker
IRJET-  	  Hosting NLP based Chatbot on AWS Cloud using DockerIRJET-  	  Hosting NLP based Chatbot on AWS Cloud using Docker
IRJET- Hosting NLP based Chatbot on AWS Cloud using Docker
 
Comparing the performance of a business process: using Excel & Python
Comparing the performance of a business process: using Excel & PythonComparing the performance of a business process: using Excel & Python
Comparing the performance of a business process: using Excel & Python
 
Visual Network Narrations
Visual Network NarrationsVisual Network Narrations
Visual Network Narrations
 
IRJET - Speech to Speech Translation using Encoder Decoder Architecture
IRJET -  	  Speech to Speech Translation using Encoder Decoder ArchitectureIRJET -  	  Speech to Speech Translation using Encoder Decoder Architecture
IRJET - Speech to Speech Translation using Encoder Decoder Architecture
 

More from NECST Lab @ Politecnico di Milano

Embedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposingEmbedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposingNECST Lab @ Politecnico di Milano
 
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...NECST Lab @ Politecnico di Milano
 
EMPhASIS - An EMbedded Public Attention Stress Identification System
 EMPhASIS - An EMbedded Public Attention Stress Identification System EMPhASIS - An EMbedded Public Attention Stress Identification System
EMPhASIS - An EMbedded Public Attention Stress Identification SystemNECST Lab @ Politecnico di Milano
 
Maeve - Fast genome analysis leveraging exact string matching
Maeve - Fast genome analysis leveraging exact string matchingMaeve - Fast genome analysis leveraging exact string matching
Maeve - Fast genome analysis leveraging exact string matchingNECST Lab @ Politecnico di Milano
 

More from NECST Lab @ Politecnico di Milano (20)

Mesticheria Team - WiiReflex
Mesticheria Team - WiiReflexMesticheria Team - WiiReflex
Mesticheria Team - WiiReflex
 
Punto e virgola Team - Stressometro
Punto e virgola Team - StressometroPunto e virgola Team - Stressometro
Punto e virgola Team - Stressometro
 
BitIt Team - Stay.straight
BitIt Team - Stay.straight BitIt Team - Stay.straight
BitIt Team - Stay.straight
 
BabYodini Team - Talking Gloves
BabYodini Team - Talking GlovesBabYodini Team - Talking Gloves
BabYodini Team - Talking Gloves
 
printf("Nome Squadra"); Team - NeoTon
printf("Nome Squadra"); Team - NeoTonprintf("Nome Squadra"); Team - NeoTon
printf("Nome Squadra"); Team - NeoTon
 
BlackBoard Team - Motion Tracking Platform
BlackBoard Team - Motion Tracking PlatformBlackBoard Team - Motion Tracking Platform
BlackBoard Team - Motion Tracking Platform
 
#include<brain.h> Team - HomeBeatHome
#include<brain.h> Team - HomeBeatHome#include<brain.h> Team - HomeBeatHome
#include<brain.h> Team - HomeBeatHome
 
Flipflops Team - Wave U
Flipflops Team - Wave UFlipflops Team - Wave U
Flipflops Team - Wave U
 
Bug(atta) Team - Little Brother
Bug(atta) Team - Little BrotherBug(atta) Team - Little Brother
Bug(atta) Team - Little Brother
 
#NECSTCamp: come partecipare
#NECSTCamp: come partecipare#NECSTCamp: come partecipare
#NECSTCamp: come partecipare
 
NECSTCamp101@2020.10.1
NECSTCamp101@2020.10.1NECSTCamp101@2020.10.1
NECSTCamp101@2020.10.1
 
NECSTLab101 2020.2021
NECSTLab101 2020.2021NECSTLab101 2020.2021
NECSTLab101 2020.2021
 
TreeHouse, nourish your community
TreeHouse, nourish your communityTreeHouse, nourish your community
TreeHouse, nourish your community
 
TiReX: Tiled Regular eXpressionsmatching architecture
TiReX: Tiled Regular eXpressionsmatching architectureTiReX: Tiled Regular eXpressionsmatching architecture
TiReX: Tiled Regular eXpressionsmatching architecture
 
Embedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposingEmbedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposing
 
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
 
EMPhASIS - An EMbedded Public Attention Stress Identification System
 EMPhASIS - An EMbedded Public Attention Stress Identification System EMPhASIS - An EMbedded Public Attention Stress Identification System
EMPhASIS - An EMbedded Public Attention Stress Identification System
 
Luns - Automatic lungs segmentation through neural network
Luns - Automatic lungs segmentation through neural networkLuns - Automatic lungs segmentation through neural network
Luns - Automatic lungs segmentation through neural network
 
BlastFunction: How to combine Serverless and FPGAs
BlastFunction: How to combine Serverless and FPGAsBlastFunction: How to combine Serverless and FPGAs
BlastFunction: How to combine Serverless and FPGAs
 
Maeve - Fast genome analysis leveraging exact string matching
Maeve - Fast genome analysis leveraging exact string matchingMaeve - Fast genome analysis leveraging exact string matching
Maeve - Fast genome analysis leveraging exact string matching
 

Recently uploaded

Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)dollysharma2066
 
Comparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization TechniquesComparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization Techniquesugginaramesh
 
computer application and construction management
computer application and construction managementcomputer application and construction management
computer application and construction managementMariconPadriquez1
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 

Recently uploaded (20)

Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
 
Comparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization TechniquesComparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization Techniques
 
computer application and construction management
computer application and construction managementcomputer application and construction management
computer application and construction management
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 

Fast Entity Linking via Graph Embeddings

  • 1. Alberto Parravicini Rhicheek Patra Davide Bartolini Marco Santambrogio 2019-05-{17-31}, NGCX Fast Entity Linking via Graph Embeddings
  • 2. Alberto Parravicini 23/05/2019 Entity Linking (EL): connecting words of interest to unique identities (e.g. Wikipedia Page) Entity Linking 2
  • 3. Alberto Parravicini 23/05/2019 Component of applications that require high-level representations of text: 1. Search Engines, for semantic search 2. Recommender Systems, to retrieve documents similar to each other 3. Chat bots, to understand intents and entities Use Cases 3
  • 4. Alberto Parravicini 23/05/2019 An EL system requires 2 steps: 1. Named Entity Recognition (NER): spot mentions (a.k.a. Named Entities) ● High-accuracy in the state-of-the-art[1] The EL Pipeline (1/2) [1] Huang, Zhiheng, Wei Xu, and Kai Yu. "Bidirectional LSTM-CRF models for sequence tagging." 4
  • 5. Alberto Parravicini 23/05/2019 An EL system requires 2 steps: 2. Entity Linking: connect mentions to entities The EL Pipeline (2/2) 5
  • 6. Alberto Parravicini 23/05/2019 An EL system requires 2 steps: 2. Entity Linking: connect mentions to entities The EL Pipeline (2/2) 6
  • 7. Alberto Parravicini 23/05/2019 An EL system requires 2 steps: 2. Entity Linking: connect mentions to entities The EL Pipeline (2/2) 7
  • 8. Alberto Parravicini 23/05/2019 ● Novel unsupervised framework for EL ● No dependency on NLP ● First EL algorithm to use graph embeddings ● Accuracy similar to supervised SoA techniques ● Highly scalable and real-time execution time ● < 1 sec to process text with 30+ mentions Our contributions 8
  • 9. Alberto Parravicini 23/05/2019 Our EL Pipeline Input text with NER Graph Building Embeddings Candidate Finder Disambiguation Graph Learning Execution Output 1 2 3 4 9
  • 10. Alberto Parravicini 23/05/2019 We obtain a large graph from DBpedia ● All the information of Wikipedia, stored as triples ● 12M entities, 170M links Step 1/4 Graph Creation 10
  • 11. Alberto Parravicini 23/05/2019 From DBPedia, we build two graphs Redirects GraphProperty Graph Step 1/4 Graph Creation 11
  • 12. Alberto Parravicini 23/05/2019 ● Graph embeddings encode vertices as vectors ● “Similar” vertices have “similar” embeddings ● Idea: entities with the same context should have low embedding distance Embeddings Creation (1/2) Step 2/4 12
  • 13. Alberto Parravicini 23/05/2019 ● In our work, we use DeepWalk[1] ● Like word2vec[2] , it leverages random walks (i.e. vertex sequences) to create embeddings ● Embedding size 170, walk length 8 ● DeepWalk uses only the graph topology ● Simple baseline, we can use better algorithms and leverage graph features Embeddings Creation (2/2) [1] Bryan Perozzi et al.2014. Deepwalk [2] Tomas Mikolov et al. 2013. Distributed representations of words and phrases and their compositionality Step 2/4 13
  • 14. Alberto Parravicini 23/05/2019 Candidates Finder Idea: for each mention, select a few candidate vertices with index- based string similarity ● Solve ambiguity following redirect and disambiguation edges Step 3/4 14
  • 15. Alberto Parravicini 23/05/2019 Candidates Finder Step 3/4 Idea: for each mention, select a few candidate vertices with index- based string similarity ● Solve ambiguity following redirect and disambiguation edges 15
  • 16. Alberto Parravicini 23/05/2019 Disambiguation (1/3) ●We want to pick the “best” candidate for each mention ● In a “good” solution, candidates are related to each other (e.g. Donald Trump, Hillary Clinton) ●Observation: a good tuple of candidates has embeddings close to each other ●Evaluating all combinations is infeasible ● 10 mentions with 100 candidates 10010 Step 4/4 16
  • 17. Alberto Parravicini 23/05/2019 17 ●We use an heuristic state-space search algorithm to maximize: Sum of string similarities Step 4/4 Sum of embedding cosine similarities w.r.t. tuple mean Disambiguation (2/3)
  • 18. Alberto Parravicini 23/05/2019 ● Iterative state-space heuristic Greedy iterative procedure Step 4/4 18 Disambiguation (3/3)
  • 19. Alberto Parravicini 23/05/2019 ● We compared against 6 SoA EL algorithms, on 5 datasets ● Our Micro-averaged F1 score is comparable with SoA supervised algorithms Results: accuracy 19
  • 20. Alberto Parravicini 23/05/2019 ● Different settings enable real-time EL, with minimal loss in accuracy ● E.g. number of iterations, early stop Results: exec. time 20
  • 21. Alberto Parravicini 23/05/2019 ● Execution time is well divided between Candidate Finder and Disambiguation Results: exec. time of single steps 21
  • 22. Thank you! Fast Entity Linking via Graph Embeddings ● Novel unsupervised framework for EL ● First EL algorithm to use graph embeddings ○ Accuracy similar to supervised SoA techniques ● Real-time execution time Alberto Parravicini, alberto.parravicini@polimi.it Rhicheek Patra Davide Bartolini Marco Santambrogio 2019-05-{17-31}, NGCX
  • 23. Alberto Parravicini 23/05/2019 ●Turn topology and properties of each vertex into a vector ●“Similar” vertices have “similar” embeddings Embeddings Step 2/4 embedding properties topology Hamilton, William L., Rex Ying, and Jure Leskovec. "Representation learning on graphs: Methods and applications." arXiv preprint arXiv:1709.05584 (2017).
  • 24. Alberto Parravicini 23/05/2019 … and join them together Step 1/4 Graph Creation
  • 25. Alberto Parravicini 23/05/2019 Candidates Finder Idea: for each mention, select a small number of candidate vertices with string similarity ● We use a simple index-based string search ● Fuzzy matching with 2-grams and 3-grams ● This provide a simple baseline (60-70% accuracy) Step 3/4