KDIR2015-Entity Linking and Knowledge Discovery in Microblogs-Presentation

PIKAKSHI MANCHANDA
DISCo, University of Milano-Bicocca, Milan, Italy
@pikakshi787
Manchanda et al., Leveraging Entity Linking for Entity Recognition in Microposts
KDIR 2015
KDIR 2015, Lisbon,12th November, 2015

 People communicate and share information increasingly through social media
platforms
 Fresh information emerging in real-time on social media platforms primarily
 New entities (newly emerging, newly relevant/popular)
 New relationships
 Factual information
 Events
2
SOCIAL MEDIA: ENTITIES-EMOJIS-EVENTS
KDIR 2015

WHY INFORMATION EXTRACTION??
3
Existing
entities
New entity
(Product Launch)
Apple Watch
Product
IBM OS2
Product
Apple
Company
New
Relations
WHY SOCIAL MEDIA
PLATFORMS??
 Fresh
 Real-time info
 Incomplete KBs
Unstructured
Web
KDIR 2015

MOTIVATION
4
KDIR 2015
 Bridging the gap between Unstructured Web and Web of Data
• Intrinsic incompleteness in KBs
 Information Extraction from social media streams (microposts,..)
• Named Entity Recognition (NER)
• Named Entity Classification
• Named Entity Linking (NEL)
 Knowledge Base (KB) enrichment
• Identify new knowledge
• Improve NER
• Lexically enriching knowledge bases for existing & new entities

INFORMATION EXTRACTION
 Named Entity Recognition: Task of identifying named entities in a piece of text
 Named Entities: text fragments that refer to entities in the real world (proper nouns..)
 Named Entity Classification: Classifying recognized named entities into entity types such as
PERSON, LOCATION, ORGANIZATION…
 Named Entity Linking: Linking the identified named entities to resources in a knowledge base
(such as Wikipedia, DBpedia)
5
KDIR 2015

6
The Town might be one of the best movies I have seen all year. So,
so good. And don't worry Ben, we already forgave you for Gigli.
Really.
http://dbpedia.org/page/Ben_Affleck
foaf:Person
yago:AmericanFilmActors
http://dbpedia.org/page/Gigli
dbo:Film
yago:AmericanFilms
http://es.dbpedia.org/page/The_Town
dbpedia-owl:Film
schema.org/Movie
KDIR 2015
NamedEntityLinking

7
The Town might be one of the best movies I have seen all year. So,
so good. And don't worry Ben, we already forgave you for Gigli.
Really.
http://dbpedia.org/page/Ben_Affleck
foaf:Person
yago:AmericanFilmActors
http://dbpedia.org/page/Gigli
dbo:Film
yago:AmericanFilms
http://live.dbpedia.org/page/The_Town_(2012_TV_series)
dbo:TelevisionShow
http://schema.org/CreativeWork
KDIR 2015
NamedEntityLinking

Entity Recognition and Linking in microposts has been reported to be quite challenging:
1. Short and noisy nature, typographic errors, shortening of words, ambiguity, polysemy (Liu et al. 2013, Ritter et
al. 2011, Meij et al. 2012)
2. Out Of Vocabulary (OOV) entity mention identification problem
 The Big Bang Theory being referred as TBBT
3. Out of Knowledge base (OOKB) entity problem
 A new upcoming company Widro
8
CHALLENGES
KDIR 2015

9
Systems/Tools Approach Domain Entity Types/Classes Taxonomy
ANNIE Gazetteers & FSM Newswire 7 (adapted) MUC
Stanford NER CRF Newswire 4, 3 or 7 CoNLL, ACE
Alchemy API Machine Learning Unspecified 324 Alchemy
NERD-ML KNN & Naïve
Bayes
Twitter 4 NERD
TextRazor Machine Learning Unspecified 1779 DBpedia, Freebase
Ritter et al., 2011 CRF Twitter 3 or 10 CoNLL, ACE
Liu et al. 2011 KNN & CRF Twitter 4 CoNLL, ACE
Kalina et al, 2013 Gazetteers & FSM Twitter 3 or 10 CoNLL
Derczynski et al, 2015 Structured
Learning (CRF)
Twitter 10 Freebase
ENTITY RECOGNITION
KDIR 2015

10
Tools Taxonomy Approach/ Features used Domain
DBpedia Spotlight
(Mendes et al., 2011)
DBpedia, Freebase,
Schema.org
Gazetteers and Similarity Metrics Unspecified
TAGME (Ferragina and
Scaiella, 2010)
Wikipedia Wikipedia anchor texts and the
pages linked to those anchor texts
Short texts
YODIE (Damljanovic and
Bontcheva, 2012)
DBpedia Similarity metrics and URI frequency Twitter
Babelfy (Moro et al., 2014) BabelNet semantic
network
Graph-based approach, semantic
signatures
Short text
Meij et al., 2012 Wikipedia n-gram features, concept features,
and tweet features
Twitter
S-MART, Yang et al, 2015 Wikipedia Structural Learning (Tree-based) Twitter
Weasel (Tristram et al,
2015)
DBpedia Machine Learning (using SVM) Newspaper
Articles
Guo et al., 2013 Wikipedia Structural SVM Twitter
Yamada et al., 2015 Wikipedia Supervised
(String matching, n-grams)
Twitter
Mention detection
& disambiguation
system: Pipeline
Use NEL to learn
how to perform
NER: pipeline
KDIR 2015
ENTITY LINKING

THE PROPOSED SYSTEM
 An end-to-end IE framework for microblogs to orchestrate NER and NEL
• Entity Recognition and Classification
• Candidate match retrieval for identified entities
• Entity linking
• Leverage entity linking to improve named entity classification
 Gold-standard corpus of ~2400 tweets (Ritter et al., 2011)
 Ground Truth: Manually curated set of 1616 named entities identified with entity types
 Use of DBpedia as an external KB
11
KDIR 2015

12
FRAMEWORK
KDIR 2015
Named Entity
Recognition
Tweet Surface forms of
named entities
Index
(rdfs:label)
Entity Search
Top-k labels for each
surface form
Resource
description
f
Surface form,
entity type &
context
KB
Entity Disambiguation
Resource
foreach
label
Entity Linking
Improvement of NER
Resource
for surface
form

13
ENTITY RECOGNITION
KDIR 2015
Named Entity
Recognition
Tweet Surface forms of
named entities
Index
(rdfs:label)
Entity Search
surface form
Resource
description
f
Surface form,
entity type &
context
KB
Resource
foreach
label
Entity Linking
Improvement of NER
Resource
for surface
form

 T-NER grounded on Conditional Random Fields (Sutton and McCallum, 2006)
 Classifying each entity e into one or more entity type/class c with a probability score PCRF(e,c)
Experimental Analysis: Entity Recognition
NER Systems: T-NER (Ritter et al. 2011)
14
Entity type: O Entity type:
Geo-Loc
Entity type:
Band
Entity type:
Sportsteam
 Identification Errors
 “@vogueglamGIRL Ah I know! She is simply the best in The Sept Issue. My boyfriend’s aunt worked for Anna
Wintor in NY”
 Classification Errors
 “Cant wait for the ravens game tomorrow....go ray rice!!!!!!!”
KDIR 2015
PCRF (e, c) = exp (Σ wkfk (e, c))
k=1
K

Text
Phrase
Classification
Level
Example Classification
(%)
Entity Entity Type
Entities
(1496)
Correctly
Classified
Justin Bieber Person 61.57
Incorrectly
Classified
Chicago Person 37.96
Segmentation
Error
Alpha, Omega
(Alpha-Omega)
Geo-Location,
Band
0.47
Non-
Entities
(44k)
Correctly
Classified
It Outside (O) 99.8
Incorrectly
Classified
justthen Person 0.2
T-NER Classification Performance
15
 Identifies 1496 named entities from the GS, in contrast to 1616 entities in ground
truth.
 8% of entities are not even recognized and thus classified as non-entities (amongst other
44k tokens)
Entity Type Error (%)
Band 73.83
Company 21.9
Facility 54.79
Geo-Location 19.75
Movie 75.83
Other 46.29
Person 28.18
Product 39.70
Sportsteam 48.27
TVshow 48.71
Classification Error Rate: T-NER
KDIR 2015

16
ENTITY LINKING
KDIR 2015
Surface forms of
named entities
Index
(rdfs:label)
Entity Search
surface form
Resource
description
f
Surface form,
entity type &
context
KB
Resource
foreach
label
DBpedia Titles files and NLP resources available at: http://wiki.dbpedia.org/Downloads2015-04
Entity Linking
Named Entity
Recognition
Tweet
Improvement of NER
Resource
for surface
form

17
Classifiable
Named
Entity
Linking Level Example Linking
(%)
Entity DBpedia
Type
Linkable Correctly
Linked
Wisconsin Geo-
Location
63.11
Incorrectly
Linked
America Movie 3.05
Uninformative N.J. Thing 16.15
Non-
Linkable
Uninformative Secrets Thing 11.85
Generic Whitney Other 5.83
A total of 1442 entities out of 1496 entities are
disambiguated with ~4k candidate KB resources
Entity Linking-Performance Analysis
Matching function, PKB (e, rc), to detect the resource for a
surface form of named entity in KB, if it exists:
1. Lexical Similarity, lex(e, lrc)
2. Coherence, coh(e+, drc)
KDIR 2015
Experimental Analysis: Entity Linking
⇒ PKB (e, rc) = *(lex(e, lrc)) + (1- )*(coh(e+, drc))
( currently set to 0.5)

18
ENTITY RECOGNITION ENHANCEMENT
KDIR 2015
Surface forms of
named entities
Index
(rdfs:label)
Entity Search
surface form
Resource
description
f
Surface form,
entity type &
context
KB
Resource
foreach
label
Entity Linking
Named Entity
Recognition
Tweet
T-NER+
Resource
for surface
form
c*
e = argmax {PCRF (e, c)*PKB (e, rc)}
c

T-NER Performance
Analysis
T-NER+ Performance
Analysis
Entity Type Precision Recall F1 Precision Recall F1
Band 0.26 0.88 0.40 0.39 0.90 0.54
Company 0.78 0.90 0.84 0.81 0.90 0.85
Facility 0.45 0.72 0.55 0.50 0.72 0.59
Geo-Location 0.80 0.95 0.87 0.80 0.95 0.87
Movie 0.24 0.88 0.38 0.34 0.88 0.49
Other 0.57 0.70 0.63 0.56 0.76 0.64
Person 0.72 0.92 0.81 0.77 0.92 0.84
Product 0.60 0.69 0.65 0.63 0.71 0.67
Sportsteam 0.52 0.83 0.64 0.63 0.85 0.72
TVshow 0.51 0.91 0.66 0.45 0.89 0.59
Overall 0.62 0.87 0.73 0.66 0.88 0.76
Comparative Analysis: T-NER and T-NER+
19
KDIR 2015
Experimental Analysis: Entity Recognition Enhancement
Entity Ground-Truth T-NER T-NER+
30stm Band Product Band
Yahoo Company Band Company
Southgate
House
Facility Band Facility
Canada Geo-Location Person Geo-Location
Camp rock 2 Movie Person Movie
Thanksgiving Other Person Other
John Acuff Person Facility Person
iphone Product Company Product
Lions Sportsteam Person Sportsteam
TMZ TVshow Band TVshow
Example: Re-classification of entities
Precision (P) =
|{cor.cl} ∩ {cl}|
|{cl}|
Recall (R) =
|{cor.cl} ∩ {cl}|
|{cor. cl}|
F1 Measure =
2 x P x R
P+R
cor.cl denotes correctly classified entities,
while cl denotes classified entities.

20
 New knowledge emerges constantly on social media streams
 Its important to identify new knowledge in order to bridge the gap
between Unstructured Web and Web of Data
 An end-to-end entity linking pipeline might be helpful for
detecting new knowledge
 Entity linking can be used to improve classification performance
of an entity recognition system
 Improving entity recognition is crucial for identifying new entities
KDIR 2015

21
 Presented an end-to-end entity linking pipeline for short textual formats (microposts)
 Presented an approach for improving entity recognition through re-classification
 Marginal improvements observed in re-classification using linked entities
 A definite scope for improving the current system
 New knowledge has been identified, though not dealt with currently
 Quality assessment, trustworthy factors…
 Relation extraction from microposts to improve identification of new knowledge
 Experimenting with more recent datasets
KDIR 2015

KDIR2015-Entity Linking and Knowledge Discovery in Microblogs-Presentation

KDIR2015-Entity Linking and Knowledge Discovery in Microblogs-Presentation

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (6)

Similar to KDIR2015-Entity Linking and Knowledge Discovery in Microblogs-Presentation

Similar to KDIR2015-Entity Linking and Knowledge Discovery in Microblogs-Presentation (20)

KDIR2015-Entity Linking and Knowledge Discovery in Microblogs-Presentation