SlideShare a Scribd company logo
1 of 9
ELIS – Multimedia Lab
Fréderic Godin, Pedro Debevere, Erik Mannens,
Wesley De Neve and Rik Van de Walle
MSM2013 IE Challenge:
Leveraging Existing Tools for
Named Entity Recognition in Microposts
Multimedia Lab, Ghent University – iMinds, Belgium
Image and Video Systems Lab, KAIST, South Korea
2
ELIS – Multimedia Lab
MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts
Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle
Making Sense of Micropost Workshop @ World Wide Web Conference 2013
Introduction: The challenge
Existing tools for NER are developed for news corpera
Develop NER tools for microposts
4 entity types: Person
Location
Organisation
Miscellaneous (film/movie, entertainment award event,
political event, programming language,
sporting event and TV show)
3
ELIS – Multimedia Lab
MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts
Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle
Making Sense of Micropost Workshop @ World Wide Web Conference 2013
How do current NER tools perform? (1)
Rizzo et al. evaluated the performance of:
AlchemyAPI, DBpedia Spotlight, Evri, Extractiv,
OpenCalais and Zemanta
On:
5 TED talks, 1000 news articles, and 217 conference
abstracts.
Could we do the same evaluation for microposts?
4
ELIS – Multimedia Lab
MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts
Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle
Making Sense of Micropost Workshop @ World Wide Web Conference 2013
How do current NER tools perform? (2)
Preprocessing: convert bracket tokens to brackets
Note: values can differ based on ontology mapping used!
PER LOC ORG MISC
AlchemyAPI 78.20% 74.60% 54.40% 10.20%
Spotlight (0.2) 57.60% 46.40% 24.40% 5.00%
Spotlight (0.5) 32.90% 3.70% 6.50% 7.30%
OpenCalais 69.30% 73.10% 55.80% 31.40%
Zemanta 70.40% 64.30% 48.10% 29.30%
F1 values
5
ELIS – Multimedia Lab
MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts
Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle
Making Sense of Micropost Workshop @ World Wide Web Conference 2013
How do current NER tools perform? (3)
AlchemyAPI: performs bad in recognizing exotic names,
small villages, buildings and organizations
Zemanta: same as AlchemyAPI + relies on capitalisation
OpenCalais: bad in recognizing small villages, buildings and
organizations. Does recognize big events!
DBpedia Spotlight: returns multiple ‘possible’ entities
What if we combine the power of all 4 services?
6
ELIS – Multimedia Lab
MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts
Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle
Making Sense of Micropost Workshop @ World Wide Web Conference 2013
Combining existing services (1)
Apply machine learning on a feature vector of the output
of the different services
AlchemyAPI DBpedia Spotlight OpenCalais Zemanta
Random Forest
Confidence level
PER, LOC, ORG, MISC
Service specific entity
16 features
PER, LOC, ORG, MISC
7
ELIS – Multimedia Lab
MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts
Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle
Making Sense of Micropost Workshop @ World Wide Web Conference 2013
Combining existing services (2)
Evaluation on entity type
PER LOC ORG MISC
Spotlight (0.2) 82.20% 75.70% 60.40% 47.40%
Spotlight (0.5) 81.60% 74.30% 59.40% 40.50%
Noisy input data gives better results
(final results on test set are not included and are part of the challenge)
8
ELIS – Multimedia Lab
MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts
Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle
Making Sense of Micropost Workshop @ World Wide Web Conference 2013
Conclusions
Current NER tools do perform well in most cases
Shortcomings: Incorrect use of capital lettres
Abbreviations of organisations
Small villages, counties and buildings
Combining the output of several services yields good results
9
ELIS – Multimedia Lab
MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts
Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle
Making Sense of Micropost Workshop @ World Wide Web Conference 2013
#Questions @frederic_godin #MMLab

More Related Content

Viewers also liked (9)

julio
 julio julio
julio
 
junio
 junio junio
junio
 
4. abril
4. abril4. abril
4. abril
 
MAYO
MAYOMAYO
MAYO
 
septiembre
 septiembre septiembre
septiembre
 
octubre
octubreoctubre
octubre
 
3. marzo
3. marzo3. marzo
3. marzo
 
11. noviembre
11. noviembre11. noviembre
11. noviembre
 
Named Entity Recognition for Twitter Microposts (only) using Distributed Word...
Named Entity Recognition for Twitter Microposts (only) using Distributed Word...Named Entity Recognition for Twitter Microposts (only) using Distributed Word...
Named Entity Recognition for Twitter Microposts (only) using Distributed Word...
 

Similar to Msm2013challenge

34th.余凯.机器学习进展及语音图像中的应用
34th.余凯.机器学习进展及语音图像中的应用34th.余凯.机器学习进展及语音图像中的应用
34th.余凯.机器学习进展及语音图像中的应用
komunling
 
Identification keys
Identification keysIdentification keys
Identification keys
vbrant
 
ECIR 2013 Keynote - Time for Events
ECIR 2013 Keynote - Time for EventsECIR 2013 Keynote - Time for Events
ECIR 2013 Keynote - Time for Events
mor
 

Similar to Msm2013challenge (20)

DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
 
34th.余凯.机器学习进展及语音图像中的应用
34th.余凯.机器学习进展及语音图像中的应用34th.余凯.机器学习进展及语音图像中的应用
34th.余凯.机器学习进展及语音图像中的应用
 
Yu Yuan (IEEE Standards Association): The Road to the Ultimate VR/AR - Transf...
Yu Yuan (IEEE Standards Association): The Road to the Ultimate VR/AR - Transf...Yu Yuan (IEEE Standards Association): The Road to the Ultimate VR/AR - Transf...
Yu Yuan (IEEE Standards Association): The Road to the Ultimate VR/AR - Transf...
 
From Representation to Mediation: A New Agenda for Conceptual Modeling Resear...
From Representation to Mediation: A New Agenda for Conceptual Modeling Resear...From Representation to Mediation: A New Agenda for Conceptual Modeling Resear...
From Representation to Mediation: A New Agenda for Conceptual Modeling Resear...
 
Identification keys
Identification keysIdentification keys
Identification keys
 
BEST IMAGE PROCESSING TOOLS TO EXPECT in 2023 – Tutors India
BEST IMAGE PROCESSING TOOLS TO EXPECT in 2023 – Tutors IndiaBEST IMAGE PROCESSING TOOLS TO EXPECT in 2023 – Tutors India
BEST IMAGE PROCESSING TOOLS TO EXPECT in 2023 – Tutors India
 
« The Microsoft Research Lab of Things » : from IoT research and prototyping,...
« The Microsoft Research Lab of Things » : from IoT research and prototyping,...« The Microsoft Research Lab of Things » : from IoT research and prototyping,...
« The Microsoft Research Lab of Things » : from IoT research and prototyping,...
 
Visualization for Software Analytics
Visualization for Software AnalyticsVisualization for Software Analytics
Visualization for Software Analytics
 
DEEPFAKE DETECTION TECHNIQUES: A REVIEW
DEEPFAKE DETECTION TECHNIQUES: A REVIEWDEEPFAKE DETECTION TECHNIQUES: A REVIEW
DEEPFAKE DETECTION TECHNIQUES: A REVIEW
 
Video AI for Media and Entertainment Industry
Video AI for Media and Entertainment IndustryVideo AI for Media and Entertainment Industry
Video AI for Media and Entertainment Industry
 
Exponentials and Networks
Exponentials and NetworksExponentials and Networks
Exponentials and Networks
 
research Paper face recognition attendance system
research Paper face recognition attendance systemresearch Paper face recognition attendance system
research Paper face recognition attendance system
 
Open Cv – An Introduction To The Vision
Open Cv – An Introduction To The VisionOpen Cv – An Introduction To The Vision
Open Cv – An Introduction To The Vision
 
No specimen (software) left behind
No specimen (software) left behindNo specimen (software) left behind
No specimen (software) left behind
 
A Smart Assistance for Visually Impaired
A Smart Assistance for Visually ImpairedA Smart Assistance for Visually Impaired
A Smart Assistance for Visually Impaired
 
ECIR 2013 Keynote - Time for Events
ECIR 2013 Keynote - Time for EventsECIR 2013 Keynote - Time for Events
ECIR 2013 Keynote - Time for Events
 
myExperiment - Defining the Social Virtual Research Environment
myExperiment - Defining the Social Virtual Research EnvironmentmyExperiment - Defining the Social Virtual Research Environment
myExperiment - Defining the Social Virtual Research Environment
 
Faculty, Visuals, and Values: Shaping a Learning Technology Ecosystem
Faculty, Visuals, and Values: Shaping a Learning Technology EcosystemFaculty, Visuals, and Values: Shaping a Learning Technology Ecosystem
Faculty, Visuals, and Values: Shaping a Learning Technology Ecosystem
 
GDSC MMCOE - ML Campaign
GDSC MMCOE - ML CampaignGDSC MMCOE - ML Campaign
GDSC MMCOE - ML Campaign
 
MetatonZ: Troyer Patents Elevator uptodate 60614
MetatonZ: Troyer Patents Elevator uptodate 60614MetatonZ: Troyer Patents Elevator uptodate 60614
MetatonZ: Troyer Patents Elevator uptodate 60614
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 

Msm2013challenge

  • 1. ELIS – Multimedia Lab Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts Multimedia Lab, Ghent University – iMinds, Belgium Image and Video Systems Lab, KAIST, South Korea
  • 2. 2 ELIS – Multimedia Lab MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle Making Sense of Micropost Workshop @ World Wide Web Conference 2013 Introduction: The challenge Existing tools for NER are developed for news corpera Develop NER tools for microposts 4 entity types: Person Location Organisation Miscellaneous (film/movie, entertainment award event, political event, programming language, sporting event and TV show)
  • 3. 3 ELIS – Multimedia Lab MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle Making Sense of Micropost Workshop @ World Wide Web Conference 2013 How do current NER tools perform? (1) Rizzo et al. evaluated the performance of: AlchemyAPI, DBpedia Spotlight, Evri, Extractiv, OpenCalais and Zemanta On: 5 TED talks, 1000 news articles, and 217 conference abstracts. Could we do the same evaluation for microposts?
  • 4. 4 ELIS – Multimedia Lab MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle Making Sense of Micropost Workshop @ World Wide Web Conference 2013 How do current NER tools perform? (2) Preprocessing: convert bracket tokens to brackets Note: values can differ based on ontology mapping used! PER LOC ORG MISC AlchemyAPI 78.20% 74.60% 54.40% 10.20% Spotlight (0.2) 57.60% 46.40% 24.40% 5.00% Spotlight (0.5) 32.90% 3.70% 6.50% 7.30% OpenCalais 69.30% 73.10% 55.80% 31.40% Zemanta 70.40% 64.30% 48.10% 29.30% F1 values
  • 5. 5 ELIS – Multimedia Lab MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle Making Sense of Micropost Workshop @ World Wide Web Conference 2013 How do current NER tools perform? (3) AlchemyAPI: performs bad in recognizing exotic names, small villages, buildings and organizations Zemanta: same as AlchemyAPI + relies on capitalisation OpenCalais: bad in recognizing small villages, buildings and organizations. Does recognize big events! DBpedia Spotlight: returns multiple ‘possible’ entities What if we combine the power of all 4 services?
  • 6. 6 ELIS – Multimedia Lab MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle Making Sense of Micropost Workshop @ World Wide Web Conference 2013 Combining existing services (1) Apply machine learning on a feature vector of the output of the different services AlchemyAPI DBpedia Spotlight OpenCalais Zemanta Random Forest Confidence level PER, LOC, ORG, MISC Service specific entity 16 features PER, LOC, ORG, MISC
  • 7. 7 ELIS – Multimedia Lab MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle Making Sense of Micropost Workshop @ World Wide Web Conference 2013 Combining existing services (2) Evaluation on entity type PER LOC ORG MISC Spotlight (0.2) 82.20% 75.70% 60.40% 47.40% Spotlight (0.5) 81.60% 74.30% 59.40% 40.50% Noisy input data gives better results (final results on test set are not included and are part of the challenge)
  • 8. 8 ELIS – Multimedia Lab MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle Making Sense of Micropost Workshop @ World Wide Web Conference 2013 Conclusions Current NER tools do perform well in most cases Shortcomings: Incorrect use of capital lettres Abbreviations of organisations Small villages, counties and buildings Combining the output of several services yields good results
  • 9. 9 ELIS – Multimedia Lab MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle Making Sense of Micropost Workshop @ World Wide Web Conference 2013 #Questions @frederic_godin #MMLab