Successfully reported this slideshow.

Msm2013challenge

1

Share

Upcoming SlideShare
The age of GANs
The age of GANs
Loading in …3
×
1 of 9
1 of 9

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Msm2013challenge

  1. 1. ELIS – Multimedia Lab Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts Multimedia Lab, Ghent University – iMinds, Belgium Image and Video Systems Lab, KAIST, South Korea
  2. 2. 2 ELIS – Multimedia Lab MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle Making Sense of Micropost Workshop @ World Wide Web Conference 2013 Introduction: The challenge Existing tools for NER are developed for news corpera Develop NER tools for microposts 4 entity types: Person Location Organisation Miscellaneous (film/movie, entertainment award event, political event, programming language, sporting event and TV show)
  3. 3. 3 ELIS – Multimedia Lab MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle Making Sense of Micropost Workshop @ World Wide Web Conference 2013 How do current NER tools perform? (1) Rizzo et al. evaluated the performance of: AlchemyAPI, DBpedia Spotlight, Evri, Extractiv, OpenCalais and Zemanta On: 5 TED talks, 1000 news articles, and 217 conference abstracts. Could we do the same evaluation for microposts?
  4. 4. 4 ELIS – Multimedia Lab MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle Making Sense of Micropost Workshop @ World Wide Web Conference 2013 How do current NER tools perform? (2) Preprocessing: convert bracket tokens to brackets Note: values can differ based on ontology mapping used! PER LOC ORG MISC AlchemyAPI 78.20% 74.60% 54.40% 10.20% Spotlight (0.2) 57.60% 46.40% 24.40% 5.00% Spotlight (0.5) 32.90% 3.70% 6.50% 7.30% OpenCalais 69.30% 73.10% 55.80% 31.40% Zemanta 70.40% 64.30% 48.10% 29.30% F1 values
  5. 5. 5 ELIS – Multimedia Lab MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle Making Sense of Micropost Workshop @ World Wide Web Conference 2013 How do current NER tools perform? (3) AlchemyAPI: performs bad in recognizing exotic names, small villages, buildings and organizations Zemanta: same as AlchemyAPI + relies on capitalisation OpenCalais: bad in recognizing small villages, buildings and organizations. Does recognize big events! DBpedia Spotlight: returns multiple ‘possible’ entities What if we combine the power of all 4 services?
  6. 6. 6 ELIS – Multimedia Lab MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle Making Sense of Micropost Workshop @ World Wide Web Conference 2013 Combining existing services (1) Apply machine learning on a feature vector of the output of the different services AlchemyAPI DBpedia Spotlight OpenCalais Zemanta Random Forest Confidence level PER, LOC, ORG, MISC Service specific entity 16 features PER, LOC, ORG, MISC
  7. 7. 7 ELIS – Multimedia Lab MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle Making Sense of Micropost Workshop @ World Wide Web Conference 2013 Combining existing services (2) Evaluation on entity type PER LOC ORG MISC Spotlight (0.2) 82.20% 75.70% 60.40% 47.40% Spotlight (0.5) 81.60% 74.30% 59.40% 40.50% Noisy input data gives better results (final results on test set are not included and are part of the challenge)
  8. 8. 8 ELIS – Multimedia Lab MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle Making Sense of Micropost Workshop @ World Wide Web Conference 2013 Conclusions Current NER tools do perform well in most cases Shortcomings: Incorrect use of capital lettres Abbreviations of organisations Small villages, counties and buildings Combining the output of several services yields good results
  9. 9. 9 ELIS – Multimedia Lab MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle Making Sense of Micropost Workshop @ World Wide Web Conference 2013 #Questions @frederic_godin #MMLab

×