1. ELIS – Multimedia Lab
Fréderic Godin, Pedro Debevere, Erik Mannens,
Wesley De Neve and Rik Van de Walle
MSM2013 IE Challenge:
Leveraging Existing Tools for
Named Entity Recognition in Microposts
Multimedia Lab, Ghent University – iMinds, Belgium
Image and Video Systems Lab, KAIST, South Korea
2. 2
ELIS – Multimedia Lab
MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts
Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle
Making Sense of Micropost Workshop @ World Wide Web Conference 2013
Introduction: The challenge
Existing tools for NER are developed for news corpera
Develop NER tools for microposts
4 entity types: Person
Location
Organisation
Miscellaneous (film/movie, entertainment award event,
political event, programming language,
sporting event and TV show)
3. 3
ELIS – Multimedia Lab
MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts
Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle
Making Sense of Micropost Workshop @ World Wide Web Conference 2013
How do current NER tools perform? (1)
Rizzo et al. evaluated the performance of:
AlchemyAPI, DBpedia Spotlight, Evri, Extractiv,
OpenCalais and Zemanta
On:
5 TED talks, 1000 news articles, and 217 conference
abstracts.
Could we do the same evaluation for microposts?
4. 4
ELIS – Multimedia Lab
MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts
Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle
Making Sense of Micropost Workshop @ World Wide Web Conference 2013
How do current NER tools perform? (2)
Preprocessing: convert bracket tokens to brackets
Note: values can differ based on ontology mapping used!
PER LOC ORG MISC
AlchemyAPI 78.20% 74.60% 54.40% 10.20%
Spotlight (0.2) 57.60% 46.40% 24.40% 5.00%
Spotlight (0.5) 32.90% 3.70% 6.50% 7.30%
OpenCalais 69.30% 73.10% 55.80% 31.40%
Zemanta 70.40% 64.30% 48.10% 29.30%
F1 values
5. 5
ELIS – Multimedia Lab
MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts
Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle
Making Sense of Micropost Workshop @ World Wide Web Conference 2013
How do current NER tools perform? (3)
AlchemyAPI: performs bad in recognizing exotic names,
small villages, buildings and organizations
Zemanta: same as AlchemyAPI + relies on capitalisation
OpenCalais: bad in recognizing small villages, buildings and
organizations. Does recognize big events!
DBpedia Spotlight: returns multiple ‘possible’ entities
What if we combine the power of all 4 services?
6. 6
ELIS – Multimedia Lab
MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts
Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle
Making Sense of Micropost Workshop @ World Wide Web Conference 2013
Combining existing services (1)
Apply machine learning on a feature vector of the output
of the different services
AlchemyAPI DBpedia Spotlight OpenCalais Zemanta
Random Forest
Confidence level
PER, LOC, ORG, MISC
Service specific entity
16 features
PER, LOC, ORG, MISC
7. 7
ELIS – Multimedia Lab
MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts
Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle
Making Sense of Micropost Workshop @ World Wide Web Conference 2013
Combining existing services (2)
Evaluation on entity type
PER LOC ORG MISC
Spotlight (0.2) 82.20% 75.70% 60.40% 47.40%
Spotlight (0.5) 81.60% 74.30% 59.40% 40.50%
Noisy input data gives better results
(final results on test set are not included and are part of the challenge)
8. 8
ELIS – Multimedia Lab
MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts
Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle
Making Sense of Micropost Workshop @ World Wide Web Conference 2013
Conclusions
Current NER tools do perform well in most cases
Shortcomings: Incorrect use of capital lettres
Abbreviations of organisations
Small villages, counties and buildings
Combining the output of several services yields good results
9. 9
ELIS – Multimedia Lab
MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts
Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle
Making Sense of Micropost Workshop @ World Wide Web Conference 2013
#Questions @frederic_godin #MMLab