Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Msm2013challenge

828 views

Published on

Our submission for the Making Sense of Micropost IE Challenge at the World Wide Web Conference 2013

Published in: Technology, Business
  • Be the first to comment

Msm2013challenge

  1. 1. ELIS – Multimedia LabFréderic Godin, Pedro Debevere, Erik Mannens,Wesley De Neve and Rik Van de WalleMSM2013 IE Challenge:Leveraging Existing Tools forNamed Entity Recognition in MicropostsMultimedia Lab, Ghent University – iMinds, BelgiumImage and Video Systems Lab, KAIST, South Korea
  2. 2. 2ELIS – Multimedia LabMSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in MicropostsFréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de WalleMaking Sense of Micropost Workshop @ World Wide Web Conference 2013Introduction: The challengeExisting tools for NER are developed for news corperaDevelop NER tools for microposts4 entity types: PersonLocationOrganisationMiscellaneous (film/movie, entertainment award event,political event, programming language,sporting event and TV show)
  3. 3. 3ELIS – Multimedia LabMSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in MicropostsFréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de WalleMaking Sense of Micropost Workshop @ World Wide Web Conference 2013How do current NER tools perform? (1)Rizzo et al. evaluated the performance of:AlchemyAPI, DBpedia Spotlight, Evri, Extractiv,OpenCalais and ZemantaOn:5 TED talks, 1000 news articles, and 217 conferenceabstracts.Could we do the same evaluation for microposts?
  4. 4. 4ELIS – Multimedia LabMSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in MicropostsFréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de WalleMaking Sense of Micropost Workshop @ World Wide Web Conference 2013How do current NER tools perform? (2)Preprocessing: convert bracket tokens to bracketsNote: values can differ based on ontology mapping used!PER LOC ORG MISCAlchemyAPI 78.20% 74.60% 54.40% 10.20%Spotlight (0.2) 57.60% 46.40% 24.40% 5.00%Spotlight (0.5) 32.90% 3.70% 6.50% 7.30%OpenCalais 69.30% 73.10% 55.80% 31.40%Zemanta 70.40% 64.30% 48.10% 29.30%F1 values
  5. 5. 5ELIS – Multimedia LabMSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in MicropostsFréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de WalleMaking Sense of Micropost Workshop @ World Wide Web Conference 2013How do current NER tools perform? (3)AlchemyAPI: performs bad in recognizing exotic names,small villages, buildings and organizationsZemanta: same as AlchemyAPI + relies on capitalisationOpenCalais: bad in recognizing small villages, buildings andorganizations. Does recognize big events!DBpedia Spotlight: returns multiple ‘possible’ entitiesWhat if we combine the power of all 4 services?
  6. 6. 6ELIS – Multimedia LabMSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in MicropostsFréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de WalleMaking Sense of Micropost Workshop @ World Wide Web Conference 2013Combining existing services (1)Apply machine learning on a feature vector of the outputof the different servicesAlchemyAPI DBpedia Spotlight OpenCalais ZemantaRandom ForestConfidence levelPER, LOC, ORG, MISCService specific entity16 featuresPER, LOC, ORG, MISC
  7. 7. 7ELIS – Multimedia LabMSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in MicropostsFréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de WalleMaking Sense of Micropost Workshop @ World Wide Web Conference 2013Combining existing services (2)Evaluation on entity typePER LOC ORG MISCSpotlight (0.2) 82.20% 75.70% 60.40% 47.40%Spotlight (0.5) 81.60% 74.30% 59.40% 40.50%Noisy input data gives better results(final results on test set are not included and are part of the challenge)
  8. 8. 8ELIS – Multimedia LabMSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in MicropostsFréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de WalleMaking Sense of Micropost Workshop @ World Wide Web Conference 2013ConclusionsCurrent NER tools do perform well in most casesShortcomings: Incorrect use of capital lettresAbbreviations of organisationsSmall villages, counties and buildingsCombining the output of several services yields good results
  9. 9. 9ELIS – Multimedia LabMSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in MicropostsFréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de WalleMaking Sense of Micropost Workshop @ World Wide Web Conference 2013#Questions @frederic_godin #MMLab

×