SlideShare a Scribd company logo
PIKAKSHI MANCHANDA
DISCo, University of Milano-Bicocca, Milan, Italy
@pikakshi787
Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts
KDIR 2015
KDIR 2015, Lisbon,12th November, 2015
 People communicate and share information increasingly through social media
platforms
 Fresh information emerging in real-time on social media platforms primarily
 New entities (newly emerging, newly relevant/popular)
 New relationships
 Factual information
 Events
2
SOCIAL MEDIA: ENTITIES-EMOJIS-EVENTS
Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts
KDIR 2015
WHY INFORMATION EXTRACTION??
3
Existing
entities
New entity
(Product Launch)
Apple Watch
Product
IBM OS2
Product
Apple
Company
New
Relations
WHY SOCIAL MEDIA
PLATFORMS??
 Fresh
 Real-time info
 Incomplete KBs
Unstructured
Web
Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts
KDIR 2015
MOTIVATION
4
Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts
KDIR 2015
 Bridging the gap between Unstructured Web and Web of Data
• Intrinsic incompleteness in KBs
 Information Extraction from social media streams (microposts,..)
• Named Entity Recognition (NER)
• Named Entity Classification
• Named Entity Linking (NEL)
 Knowledge Base (KB) enrichment
• Identify new knowledge
• Improve NER
• Lexically enriching knowledge bases for existing & new entities
INFORMATION EXTRACTION
 Named Entity Recognition: Task of identifying named entities in a piece of text
 Named Entities: text fragments that refer to entities in the real world (proper nouns..)
 Named Entity Classification: Classifying recognized named entities into entity types such as
PERSON, LOCATION, ORGANIZATION…
 Named Entity Linking: Linking the identified named entities to resources in a knowledge base
(such as Wikipedia, DBpedia)
5
Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts
KDIR 2015
6
The Town might be one of the best movies I have seen all year. So,
so good. And don't worry Ben, we already forgave you for Gigli.
Really.
http://dbpedia.org/page/Ben_Affleck
foaf:Person
yago:AmericanFilmActors
http://dbpedia.org/page/Gigli
dbo:Film
yago:AmericanFilms
http://es.dbpedia.org/page/The_Town
dbpedia-owl:Film
schema.org/Movie
Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts
KDIR 2015
NamedEntityLinking
INFORMATION EXTRACTION
7
The Town might be one of the best movies I have seen all year. So,
so good. And don't worry Ben, we already forgave you for Gigli.
Really.
http://dbpedia.org/page/Ben_Affleck
foaf:Person
yago:AmericanFilmActors
http://dbpedia.org/page/Gigli
dbo:Film
yago:AmericanFilms
http://live.dbpedia.org/page/The_Town_(2012_TV_series)
dbo:TelevisionShow
http://schema.org/CreativeWork
Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts
KDIR 2015
INFORMATION EXTRACTION
NamedEntityLinking
Entity Recognition and Linking in microposts has been reported to be quite challenging:
1. Short and noisy nature, typographic errors, shortening of words, ambiguity, polysemy (Liu et al. 2013, Ritter et
al. 2011, Meij et al. 2012)
2. Out Of Vocabulary (OOV) entity mention identification problem
 The Big Bang Theory being referred as TBBT
3. Out of Knowledge base (OOKB) entity problem
 A new upcoming company Widro
8
CHALLENGES
Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts
KDIR 2015
9
Systems/Tools Approach Domain Entity Types/Classes Taxonomy
ANNIE Gazetteers & FSM Newswire 7 (adapted) MUC
Stanford NER CRF Newswire 4, 3 or 7 CoNLL, ACE
Alchemy API Machine Learning Unspecified 324 Alchemy
NERD-ML KNN & Naïve
Bayes
Twitter 4 NERD
TextRazor Machine Learning Unspecified 1779 DBpedia, Freebase
Ritter et al., 2011 CRF Twitter 3 or 10 CoNLL, ACE
Liu et al. 2011 KNN & CRF Twitter 4 CoNLL, ACE
Kalina et al, 2013 Gazetteers & FSM Twitter 3 or 10 CoNLL
Derczynski et al, 2015 Structured
Learning (CRF)
Twitter 10 Freebase
ENTITY RECOGNITION
Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts
KDIR 2015
10
Tools Taxonomy Approach/ Features used Domain
DBpedia Spotlight
(Mendes et al., 2011)
DBpedia, Freebase,
Schema.org
Gazetteers and Similarity Metrics Unspecified
TAGME (Ferragina and
Scaiella, 2010)
Wikipedia Wikipedia anchor texts and the
pages linked to those anchor texts
Short texts
YODIE (Damljanovic and
Bontcheva, 2012)
DBpedia Similarity metrics and URI frequency Twitter
Babelfy (Moro et al., 2014) BabelNet semantic
network
Graph-based approach, semantic
signatures
Short text
Meij et al., 2012 Wikipedia n-gram features, concept features,
and tweet features
Twitter
S-MART, Yang et al, 2015 Wikipedia Structural Learning (Tree-based) Twitter
Weasel (Tristram et al,
2015)
DBpedia Machine Learning (using SVM) Newspaper
Articles
Guo et al., 2013 Wikipedia Structural SVM Twitter
Yamada et al., 2015 Wikipedia Supervised
(String matching, n-grams)
Twitter
Mention detection
& disambiguation
system: Pipeline
Use NEL to learn
how to perform
NER: pipeline
Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts
KDIR 2015
ENTITY LINKING
THE PROPOSED SYSTEM
 An end-to-end IE framework for microblogs to orchestrate NER and NEL
• Entity Recognition and Classification
• Candidate match retrieval for identified entities
• Entity linking
• Leverage entity linking to improve named entity classification
 Gold-standard corpus of ~2400 tweets (Ritter et al., 2011)
 Ground Truth: Manually curated set of 1616 named entities identified with entity types
 Use of DBpedia as an external KB
11
Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts
KDIR 2015
12
FRAMEWORK
Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts
KDIR 2015
Named Entity
Recognition
Tweet Surface forms of
named entities
Index
(rdfs:label)
Entity Search
Top-k labels for each
surface form
Resource
description
f
Surface form,
entity type &
context
KB
Entity Disambiguation
Resource
foreach
label
Entity Linking
Improvement of NER
Resource
for surface
form
13
ENTITY RECOGNITION
Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts
KDIR 2015
Named Entity
Recognition
Tweet Surface forms of
named entities
Index
(rdfs:label)
Entity Search
Top-k labels for each
surface form
Resource
description
f
Surface form,
entity type &
context
KB
Entity Disambiguation
Resource
foreach
label
Entity Linking
Improvement of NER
Resource
for surface
form
 T-NER grounded on Conditional Random Fields (Sutton and McCallum, 2006)
 Classifying each entity e into one or more entity type/class c with a probability score PCRF(e,c)
Experimental Analysis: Entity Recognition
NER Systems: T-NER (Ritter et al. 2011)
14
Entity type: O Entity type:
Geo-Loc
Entity type:
Band
Entity type:
Sportsteam
 Identification Errors
 “@vogueglamGIRL Ah I know! She is simply the best in The Sept Issue. My boyfriend’s aunt worked for Anna
Wintor in NY”
 Classification Errors
 “Cant wait for the ravens game tomorrow....go ray rice!!!!!!!”
Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts
KDIR 2015
PCRF (e, c) = exp (Σ wkfk (e, c))
k=1
K
Text
Phrase
Classification
Level
Example Classification
(%)
Entity Entity Type
Entities
(1496)
Correctly
Classified
Justin Bieber Person 61.57
Incorrectly
Classified
Chicago Person 37.96
Segmentation
Error
Alpha, Omega
(Alpha-Omega)
Geo-Location,
Band
0.47
Non-
Entities
(44k)
Correctly
Classified
It Outside (O) 99.8
Incorrectly
Classified
justthen Person 0.2
T-NER Classification Performance
15
 Identifies 1496 named entities from the GS, in contrast to 1616 entities in ground
truth.
 8% of entities are not even recognized and thus classified as non-entities (amongst other
44k tokens)
Entity Type Error (%)
Band 73.83
Company 21.9
Facility 54.79
Geo-Location 19.75
Movie 75.83
Other 46.29
Person 28.18
Product 39.70
Sportsteam 48.27
TVshow 48.71
Classification Error Rate: T-NER
Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts
KDIR 2015
16
ENTITY LINKING
Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts
KDIR 2015
Surface forms of
named entities
Index
(rdfs:label)
Entity Search
Top-k labels for each
surface form
Resource
description
f
Surface form,
entity type &
context
KB
Entity Disambiguation
Resource
foreach
label
DBpedia Titles files and NLP resources available at: http://wiki.dbpedia.org/Downloads2015-04
Entity Linking
Named Entity
Recognition
Tweet
Improvement of NER
Resource
for surface
form
17
Classifiable
Named
Entity
Linking Level Example Linking
(%)
Entity DBpedia
Type
Linkable Correctly
Linked
Wisconsin Geo-
Location
63.11
Incorrectly
Linked
America Movie 3.05
Uninformative N.J. Thing 16.15
Non-
Linkable
Uninformative Secrets Thing 11.85
Generic Whitney Other 5.83
A total of 1442 entities out of 1496 entities are
disambiguated with ~4k candidate KB resources
Entity Linking-Performance Analysis
Matching function, PKB (e, rc), to detect the resource for a
surface form of named entity in KB, if it exists:
1. Lexical Similarity, lex(e, lrc)
2. Coherence, coh(e+, drc)
Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts
KDIR 2015
Experimental Analysis: Entity Linking
⇒ PKB (e, rc) = *(lex(e, lrc)) + (1- )*(coh(e+, drc))
( currently set to 0.5)
18
ENTITY RECOGNITION ENHANCEMENT
Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts
KDIR 2015
Surface forms of
named entities
Index
(rdfs:label)
Entity Search
Top-k labels for each
surface form
Resource
description
f
Surface form,
entity type &
context
KB
Entity Disambiguation
Resource
foreach
label
Entity Linking
Named Entity
Recognition
Tweet
T-NER+
Resource
for surface
form
c*
e = argmax {PCRF (e, c)*PKB (e, rc)}
c
T-NER Performance
Analysis
T-NER+ Performance
Analysis
Entity Type Precision Recall F1 Precision Recall F1
Band 0.26 0.88 0.40 0.39 0.90 0.54
Company 0.78 0.90 0.84 0.81 0.90 0.85
Facility 0.45 0.72 0.55 0.50 0.72 0.59
Geo-Location 0.80 0.95 0.87 0.80 0.95 0.87
Movie 0.24 0.88 0.38 0.34 0.88 0.49
Other 0.57 0.70 0.63 0.56 0.76 0.64
Person 0.72 0.92 0.81 0.77 0.92 0.84
Product 0.60 0.69 0.65 0.63 0.71 0.67
Sportsteam 0.52 0.83 0.64 0.63 0.85 0.72
TVshow 0.51 0.91 0.66 0.45 0.89 0.59
Overall 0.62 0.87 0.73 0.66 0.88 0.76
Comparative Analysis: T-NER and T-NER+
19
Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts
KDIR 2015
Experimental Analysis: Entity Recognition Enhancement
Entity Ground-Truth T-NER T-NER+
30stm Band Product Band
Yahoo Company Band Company
Southgate
House
Facility Band Facility
Canada Geo-Location Person Geo-Location
Camp rock 2 Movie Person Movie
Thanksgiving Other Person Other
John Acuff Person Facility Person
iphone Product Company Product
Lions Sportsteam Person Sportsteam
TMZ TVshow Band TVshow
Example: Re-classification of entities
Precision (P) =
|{cor.cl} ∩ {cl}|
|{cl}|
Recall (R) =
|{cor.cl} ∩ {cl}|
|{cor. cl}|
F1 Measure =
2 x P x R
P+R
cor.cl denotes correctly classified entities,
while cl denotes classified entities.
20
 New knowledge emerges constantly on social media streams
 Its important to identify new knowledge in order to bridge the gap
between Unstructured Web and Web of Data
 An end-to-end entity linking pipeline might be helpful for
detecting new knowledge
 Entity linking can be used to improve classification performance
of an entity recognition system
 Improving entity recognition is crucial for identifying new entities
Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts
KDIR 2015
21
 Presented an end-to-end entity linking pipeline for short textual formats (microposts)
 Presented an approach for improving entity recognition through re-classification
 Marginal improvements observed in re-classification using linked entities
 A definite scope for improving the current system
 New knowledge has been identified, though not dealt with currently
 Quality assessment, trustworthy factors…
 Relation extraction from microposts to improve identification of new knowledge
 Experimenting with more recent datasets
Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts
KDIR 2015
KDIR2015-Entity Linking and Knowledge Discovery in Microblogs-Presentation

More Related Content

Viewers also liked

Automatic summaraitztion for_arabic
Automatic summaraitztion for_arabicAutomatic summaraitztion for_arabic
Automatic summaraitztion for_arabicArabic_NLP_ImamU2013
 
Named Entity Recognition - ACL 2011 Presentation
Named Entity Recognition - ACL 2011 PresentationNamed Entity Recognition - ACL 2011 Presentation
Named Entity Recognition - ACL 2011 Presentation
Richard Littauer
 
The named entity recognition (ner)2
The named entity recognition (ner)2The named entity recognition (ner)2
The named entity recognition (ner)2Arabic_NLP_ImamU2013
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)
Marina Santini
 

Viewers also liked (6)

Automatic summaraitztion for_arabic
Automatic summaraitztion for_arabicAutomatic summaraitztion for_arabic
Automatic summaraitztion for_arabic
 
Discourse annotation
Discourse annotationDiscourse annotation
Discourse annotation
 
Discourse annotation for arabic 2
Discourse annotation for arabic 2Discourse annotation for arabic 2
Discourse annotation for arabic 2
 
Named Entity Recognition - ACL 2011 Presentation
Named Entity Recognition - ACL 2011 PresentationNamed Entity Recognition - ACL 2011 Presentation
Named Entity Recognition - ACL 2011 Presentation
 
The named entity recognition (ner)2
The named entity recognition (ner)2The named entity recognition (ner)2
The named entity recognition (ner)2
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)
 

Similar to KDIR2015-Entity Linking and Knowledge Discovery in Microblogs-Presentation

Spivack Blogtalk 2008
Spivack Blogtalk 2008Spivack Blogtalk 2008
Spivack Blogtalk 2008Blogtalk 2008
 
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
Ig Bittencourt
 
Evaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented SearchEvaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented Search
krisztianbalog
 
Information Extraction and Linked Data Cloud
Information Extraction and Linked Data CloudInformation Extraction and Linked Data Cloud
Information Extraction and Linked Data Cloud
Dhaval Thakker
 
Make your data great again - Ver 2
Make your data great again - Ver 2Make your data great again - Ver 2
Make your data great again - Ver 2
Daniel JACOB
 
Nova Spivack - Semantic Web Talk
Nova Spivack - Semantic Web TalkNova Spivack - Semantic Web Talk
Nova Spivack - Semantic Web Talk
syawal
 
Doing Clever Things with the Semantic Web
Doing Clever Things with the Semantic WebDoing Clever Things with the Semantic Web
Doing Clever Things with the Semantic Web
Mathieu d'Aquin
 
Future of Web 2.0 & The Semantic Web
Future of Web 2.0 & The Semantic WebFuture of Web 2.0 & The Semantic Web
Future of Web 2.0 & The Semantic Web
is20090
 
BioIT Europe 2010 - BioCatalogue
BioIT Europe 2010 - BioCatalogueBioIT Europe 2010 - BioCatalogue
BioIT Europe 2010 - BioCatalogue
BioCatalogue
 
Introduction to Application Profiles
Introduction to Application ProfilesIntroduction to Application Profiles
Introduction to Application Profiles
Diane Hillmann
 
The Eprints Application Profile: a FRBR approach to modelling repository meta...
The Eprints Application Profile: a FRBR approach to modelling repository meta...The Eprints Application Profile: a FRBR approach to modelling repository meta...
The Eprints Application Profile: a FRBR approach to modelling repository meta...
Julie Allinson
 
SEMANTIC WEB SOURCES – comparison of open-source Knowledge Graphs
SEMANTIC WEB SOURCES – comparison of open-source Knowledge GraphsSEMANTIC WEB SOURCES – comparison of open-source Knowledge Graphs
SEMANTIC WEB SOURCES – comparison of open-source Knowledge Graphs
MatteoBelcao
 
Explaining The Semantic Web
Explaining The Semantic WebExplaining The Semantic Web
Explaining The Semantic WebAditya Tuli
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataAndre Freitas
 
Entity Search on Virtual Documents Created with Graph Embeddings
Entity Search on Virtual Documents Created with Graph EmbeddingsEntity Search on Virtual Documents Created with Graph Embeddings
Entity Search on Virtual Documents Created with Graph Embeddings
Sease
 
Jim Hendler's Presentation at SSSW 2011
Jim Hendler's Presentation at SSSW 2011Jim Hendler's Presentation at SSSW 2011
Jim Hendler's Presentation at SSSW 2011sssw2011
 
Linked Data Generation for the University Data From Legacy Database
Linked Data Generation for the University Data From Legacy Database  Linked Data Generation for the University Data From Legacy Database
Linked Data Generation for the University Data From Legacy Database
dannyijwest
 
Relational Navigation Brings Social Computing and Semantic Technology Computi...
Relational Navigation Brings Social Computing and Semantic Technology Computi...Relational Navigation Brings Social Computing and Semantic Technology Computi...
Relational Navigation Brings Social Computing and Semantic Technology Computi...Bradley Allen
 
Faceted Navigation (LACASIS Fall Workshop 2005)
Faceted Navigation (LACASIS Fall Workshop 2005)Faceted Navigation (LACASIS Fall Workshop 2005)
Faceted Navigation (LACASIS Fall Workshop 2005)Bradley Allen
 

Similar to KDIR2015-Entity Linking and Knowledge Discovery in Microblogs-Presentation (20)

Spivack Blogtalk 2008
Spivack Blogtalk 2008Spivack Blogtalk 2008
Spivack Blogtalk 2008
 
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
 
Evaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented SearchEvaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented Search
 
Information Extraction and Linked Data Cloud
Information Extraction and Linked Data CloudInformation Extraction and Linked Data Cloud
Information Extraction and Linked Data Cloud
 
Make your data great again - Ver 2
Make your data great again - Ver 2Make your data great again - Ver 2
Make your data great again - Ver 2
 
Nova Spivack - Semantic Web Talk
Nova Spivack - Semantic Web TalkNova Spivack - Semantic Web Talk
Nova Spivack - Semantic Web Talk
 
Doing Clever Things with the Semantic Web
Doing Clever Things with the Semantic WebDoing Clever Things with the Semantic Web
Doing Clever Things with the Semantic Web
 
Future of Web 2.0 & The Semantic Web
Future of Web 2.0 & The Semantic WebFuture of Web 2.0 & The Semantic Web
Future of Web 2.0 & The Semantic Web
 
BioIT Europe 2010 - BioCatalogue
BioIT Europe 2010 - BioCatalogueBioIT Europe 2010 - BioCatalogue
BioIT Europe 2010 - BioCatalogue
 
Introduction to Application Profiles
Introduction to Application ProfilesIntroduction to Application Profiles
Introduction to Application Profiles
 
The Eprints Application Profile: a FRBR approach to modelling repository meta...
The Eprints Application Profile: a FRBR approach to modelling repository meta...The Eprints Application Profile: a FRBR approach to modelling repository meta...
The Eprints Application Profile: a FRBR approach to modelling repository meta...
 
SEMANTIC WEB SOURCES – comparison of open-source Knowledge Graphs
SEMANTIC WEB SOURCES – comparison of open-source Knowledge GraphsSEMANTIC WEB SOURCES – comparison of open-source Knowledge Graphs
SEMANTIC WEB SOURCES – comparison of open-source Knowledge Graphs
 
Explaining The Semantic Web
Explaining The Semantic WebExplaining The Semantic Web
Explaining The Semantic Web
 
Semantic Web, e-commerce
Semantic Web, e-commerceSemantic Web, e-commerce
Semantic Web, e-commerce
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big data
 
Entity Search on Virtual Documents Created with Graph Embeddings
Entity Search on Virtual Documents Created with Graph EmbeddingsEntity Search on Virtual Documents Created with Graph Embeddings
Entity Search on Virtual Documents Created with Graph Embeddings
 
Jim Hendler's Presentation at SSSW 2011
Jim Hendler's Presentation at SSSW 2011Jim Hendler's Presentation at SSSW 2011
Jim Hendler's Presentation at SSSW 2011
 
Linked Data Generation for the University Data From Legacy Database
Linked Data Generation for the University Data From Legacy Database  Linked Data Generation for the University Data From Legacy Database
Linked Data Generation for the University Data From Legacy Database
 
Relational Navigation Brings Social Computing and Semantic Technology Computi...
Relational Navigation Brings Social Computing and Semantic Technology Computi...Relational Navigation Brings Social Computing and Semantic Technology Computi...
Relational Navigation Brings Social Computing and Semantic Technology Computi...
 
Faceted Navigation (LACASIS Fall Workshop 2005)
Faceted Navigation (LACASIS Fall Workshop 2005)Faceted Navigation (LACASIS Fall Workshop 2005)
Faceted Navigation (LACASIS Fall Workshop 2005)
 

KDIR2015-Entity Linking and Knowledge Discovery in Microblogs-Presentation

  • 1. PIKAKSHI MANCHANDA DISCo, University of Milano-Bicocca, Milan, Italy @pikakshi787 Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts KDIR 2015 KDIR 2015, Lisbon,12th November, 2015
  • 2.  People communicate and share information increasingly through social media platforms  Fresh information emerging in real-time on social media platforms primarily  New entities (newly emerging, newly relevant/popular)  New relationships  Factual information  Events 2 SOCIAL MEDIA: ENTITIES-EMOJIS-EVENTS Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts KDIR 2015
  • 3. WHY INFORMATION EXTRACTION?? 3 Existing entities New entity (Product Launch) Apple Watch Product IBM OS2 Product Apple Company New Relations WHY SOCIAL MEDIA PLATFORMS??  Fresh  Real-time info  Incomplete KBs Unstructured Web Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts KDIR 2015
  • 4. MOTIVATION 4 Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts KDIR 2015  Bridging the gap between Unstructured Web and Web of Data • Intrinsic incompleteness in KBs  Information Extraction from social media streams (microposts,..) • Named Entity Recognition (NER) • Named Entity Classification • Named Entity Linking (NEL)  Knowledge Base (KB) enrichment • Identify new knowledge • Improve NER • Lexically enriching knowledge bases for existing & new entities
  • 5. INFORMATION EXTRACTION  Named Entity Recognition: Task of identifying named entities in a piece of text  Named Entities: text fragments that refer to entities in the real world (proper nouns..)  Named Entity Classification: Classifying recognized named entities into entity types such as PERSON, LOCATION, ORGANIZATION…  Named Entity Linking: Linking the identified named entities to resources in a knowledge base (such as Wikipedia, DBpedia) 5 Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts KDIR 2015
  • 6. 6 The Town might be one of the best movies I have seen all year. So, so good. And don't worry Ben, we already forgave you for Gigli. Really. http://dbpedia.org/page/Ben_Affleck foaf:Person yago:AmericanFilmActors http://dbpedia.org/page/Gigli dbo:Film yago:AmericanFilms http://es.dbpedia.org/page/The_Town dbpedia-owl:Film schema.org/Movie Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts KDIR 2015 NamedEntityLinking INFORMATION EXTRACTION
  • 7. 7 The Town might be one of the best movies I have seen all year. So, so good. And don't worry Ben, we already forgave you for Gigli. Really. http://dbpedia.org/page/Ben_Affleck foaf:Person yago:AmericanFilmActors http://dbpedia.org/page/Gigli dbo:Film yago:AmericanFilms http://live.dbpedia.org/page/The_Town_(2012_TV_series) dbo:TelevisionShow http://schema.org/CreativeWork Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts KDIR 2015 INFORMATION EXTRACTION NamedEntityLinking
  • 8. Entity Recognition and Linking in microposts has been reported to be quite challenging: 1. Short and noisy nature, typographic errors, shortening of words, ambiguity, polysemy (Liu et al. 2013, Ritter et al. 2011, Meij et al. 2012) 2. Out Of Vocabulary (OOV) entity mention identification problem  The Big Bang Theory being referred as TBBT 3. Out of Knowledge base (OOKB) entity problem  A new upcoming company Widro 8 CHALLENGES Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts KDIR 2015
  • 9. 9 Systems/Tools Approach Domain Entity Types/Classes Taxonomy ANNIE Gazetteers & FSM Newswire 7 (adapted) MUC Stanford NER CRF Newswire 4, 3 or 7 CoNLL, ACE Alchemy API Machine Learning Unspecified 324 Alchemy NERD-ML KNN & Naïve Bayes Twitter 4 NERD TextRazor Machine Learning Unspecified 1779 DBpedia, Freebase Ritter et al., 2011 CRF Twitter 3 or 10 CoNLL, ACE Liu et al. 2011 KNN & CRF Twitter 4 CoNLL, ACE Kalina et al, 2013 Gazetteers & FSM Twitter 3 or 10 CoNLL Derczynski et al, 2015 Structured Learning (CRF) Twitter 10 Freebase ENTITY RECOGNITION Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts KDIR 2015
  • 10. 10 Tools Taxonomy Approach/ Features used Domain DBpedia Spotlight (Mendes et al., 2011) DBpedia, Freebase, Schema.org Gazetteers and Similarity Metrics Unspecified TAGME (Ferragina and Scaiella, 2010) Wikipedia Wikipedia anchor texts and the pages linked to those anchor texts Short texts YODIE (Damljanovic and Bontcheva, 2012) DBpedia Similarity metrics and URI frequency Twitter Babelfy (Moro et al., 2014) BabelNet semantic network Graph-based approach, semantic signatures Short text Meij et al., 2012 Wikipedia n-gram features, concept features, and tweet features Twitter S-MART, Yang et al, 2015 Wikipedia Structural Learning (Tree-based) Twitter Weasel (Tristram et al, 2015) DBpedia Machine Learning (using SVM) Newspaper Articles Guo et al., 2013 Wikipedia Structural SVM Twitter Yamada et al., 2015 Wikipedia Supervised (String matching, n-grams) Twitter Mention detection & disambiguation system: Pipeline Use NEL to learn how to perform NER: pipeline Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts KDIR 2015 ENTITY LINKING
  • 11. THE PROPOSED SYSTEM  An end-to-end IE framework for microblogs to orchestrate NER and NEL • Entity Recognition and Classification • Candidate match retrieval for identified entities • Entity linking • Leverage entity linking to improve named entity classification  Gold-standard corpus of ~2400 tweets (Ritter et al., 2011)  Ground Truth: Manually curated set of 1616 named entities identified with entity types  Use of DBpedia as an external KB 11 Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts KDIR 2015
  • 12. 12 FRAMEWORK Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts KDIR 2015 Named Entity Recognition Tweet Surface forms of named entities Index (rdfs:label) Entity Search Top-k labels for each surface form Resource description f Surface form, entity type & context KB Entity Disambiguation Resource foreach label Entity Linking Improvement of NER Resource for surface form
  • 13. 13 ENTITY RECOGNITION Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts KDIR 2015 Named Entity Recognition Tweet Surface forms of named entities Index (rdfs:label) Entity Search Top-k labels for each surface form Resource description f Surface form, entity type & context KB Entity Disambiguation Resource foreach label Entity Linking Improvement of NER Resource for surface form
  • 14.  T-NER grounded on Conditional Random Fields (Sutton and McCallum, 2006)  Classifying each entity e into one or more entity type/class c with a probability score PCRF(e,c) Experimental Analysis: Entity Recognition NER Systems: T-NER (Ritter et al. 2011) 14 Entity type: O Entity type: Geo-Loc Entity type: Band Entity type: Sportsteam  Identification Errors  “@vogueglamGIRL Ah I know! She is simply the best in The Sept Issue. My boyfriend’s aunt worked for Anna Wintor in NY”  Classification Errors  “Cant wait for the ravens game tomorrow....go ray rice!!!!!!!” Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts KDIR 2015 PCRF (e, c) = exp (Σ wkfk (e, c)) k=1 K
  • 15. Text Phrase Classification Level Example Classification (%) Entity Entity Type Entities (1496) Correctly Classified Justin Bieber Person 61.57 Incorrectly Classified Chicago Person 37.96 Segmentation Error Alpha, Omega (Alpha-Omega) Geo-Location, Band 0.47 Non- Entities (44k) Correctly Classified It Outside (O) 99.8 Incorrectly Classified justthen Person 0.2 T-NER Classification Performance 15  Identifies 1496 named entities from the GS, in contrast to 1616 entities in ground truth.  8% of entities are not even recognized and thus classified as non-entities (amongst other 44k tokens) Entity Type Error (%) Band 73.83 Company 21.9 Facility 54.79 Geo-Location 19.75 Movie 75.83 Other 46.29 Person 28.18 Product 39.70 Sportsteam 48.27 TVshow 48.71 Classification Error Rate: T-NER Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts KDIR 2015
  • 16. 16 ENTITY LINKING Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts KDIR 2015 Surface forms of named entities Index (rdfs:label) Entity Search Top-k labels for each surface form Resource description f Surface form, entity type & context KB Entity Disambiguation Resource foreach label DBpedia Titles files and NLP resources available at: http://wiki.dbpedia.org/Downloads2015-04 Entity Linking Named Entity Recognition Tweet Improvement of NER Resource for surface form
  • 17. 17 Classifiable Named Entity Linking Level Example Linking (%) Entity DBpedia Type Linkable Correctly Linked Wisconsin Geo- Location 63.11 Incorrectly Linked America Movie 3.05 Uninformative N.J. Thing 16.15 Non- Linkable Uninformative Secrets Thing 11.85 Generic Whitney Other 5.83 A total of 1442 entities out of 1496 entities are disambiguated with ~4k candidate KB resources Entity Linking-Performance Analysis Matching function, PKB (e, rc), to detect the resource for a surface form of named entity in KB, if it exists: 1. Lexical Similarity, lex(e, lrc) 2. Coherence, coh(e+, drc) Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts KDIR 2015 Experimental Analysis: Entity Linking ⇒ PKB (e, rc) = *(lex(e, lrc)) + (1- )*(coh(e+, drc)) ( currently set to 0.5)
  • 18. 18 ENTITY RECOGNITION ENHANCEMENT Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts KDIR 2015 Surface forms of named entities Index (rdfs:label) Entity Search Top-k labels for each surface form Resource description f Surface form, entity type & context KB Entity Disambiguation Resource foreach label Entity Linking Named Entity Recognition Tweet T-NER+ Resource for surface form c* e = argmax {PCRF (e, c)*PKB (e, rc)} c
  • 19. T-NER Performance Analysis T-NER+ Performance Analysis Entity Type Precision Recall F1 Precision Recall F1 Band 0.26 0.88 0.40 0.39 0.90 0.54 Company 0.78 0.90 0.84 0.81 0.90 0.85 Facility 0.45 0.72 0.55 0.50 0.72 0.59 Geo-Location 0.80 0.95 0.87 0.80 0.95 0.87 Movie 0.24 0.88 0.38 0.34 0.88 0.49 Other 0.57 0.70 0.63 0.56 0.76 0.64 Person 0.72 0.92 0.81 0.77 0.92 0.84 Product 0.60 0.69 0.65 0.63 0.71 0.67 Sportsteam 0.52 0.83 0.64 0.63 0.85 0.72 TVshow 0.51 0.91 0.66 0.45 0.89 0.59 Overall 0.62 0.87 0.73 0.66 0.88 0.76 Comparative Analysis: T-NER and T-NER+ 19 Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts KDIR 2015 Experimental Analysis: Entity Recognition Enhancement Entity Ground-Truth T-NER T-NER+ 30stm Band Product Band Yahoo Company Band Company Southgate House Facility Band Facility Canada Geo-Location Person Geo-Location Camp rock 2 Movie Person Movie Thanksgiving Other Person Other John Acuff Person Facility Person iphone Product Company Product Lions Sportsteam Person Sportsteam TMZ TVshow Band TVshow Example: Re-classification of entities Precision (P) = |{cor.cl} ∩ {cl}| |{cl}| Recall (R) = |{cor.cl} ∩ {cl}| |{cor. cl}| F1 Measure = 2 x P x R P+R cor.cl denotes correctly classified entities, while cl denotes classified entities.
  • 20. 20  New knowledge emerges constantly on social media streams  Its important to identify new knowledge in order to bridge the gap between Unstructured Web and Web of Data  An end-to-end entity linking pipeline might be helpful for detecting new knowledge  Entity linking can be used to improve classification performance of an entity recognition system  Improving entity recognition is crucial for identifying new entities Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts KDIR 2015
  • 21. 21  Presented an end-to-end entity linking pipeline for short textual formats (microposts)  Presented an approach for improving entity recognition through re-classification  Marginal improvements observed in re-classification using linked entities  A definite scope for improving the current system  New knowledge has been identified, though not dealt with currently  Quality assessment, trustworthy factors…  Relation extraction from microposts to improve identification of new knowledge  Experimenting with more recent datasets Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts KDIR 2015