SlideShare a Scribd company logo
ELIS – Multimedia Lab
Fréderic Godin, Pedro Debevere, Erik Mannens,
Wesley De Neve and Rik Van de Walle
MSM2013 IE Challenge:
Leveraging Existing Tools for
Named Entity Recognition in Microposts
Multimedia Lab, Ghent University – iMinds, Belgium
Image and Video Systems Lab, KAIST, South Korea
2
ELIS – Multimedia Lab
MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts
Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle
Making Sense of Micropost Workshop @ World Wide Web Conference 2013
Introduction: The challenge
Existing tools for NER are developed for news corpera
Develop NER tools for microposts
4 entity types: Person
Location
Organisation
Miscellaneous (film/movie, entertainment award event,
political event, programming language,
sporting event and TV show)
3
ELIS – Multimedia Lab
MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts
Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle
Making Sense of Micropost Workshop @ World Wide Web Conference 2013
How do current NER tools perform? (1)
Rizzo et al. evaluated the performance of:
AlchemyAPI, DBpedia Spotlight, Evri, Extractiv,
OpenCalais and Zemanta
On:
5 TED talks, 1000 news articles, and 217 conference
abstracts.
Could we do the same evaluation for microposts?
4
ELIS – Multimedia Lab
MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts
Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle
Making Sense of Micropost Workshop @ World Wide Web Conference 2013
How do current NER tools perform? (2)
Preprocessing: convert bracket tokens to brackets
Note: values can differ based on ontology mapping used!
PER LOC ORG MISC
AlchemyAPI 78.20% 74.60% 54.40% 10.20%
Spotlight (0.2) 57.60% 46.40% 24.40% 5.00%
Spotlight (0.5) 32.90% 3.70% 6.50% 7.30%
OpenCalais 69.30% 73.10% 55.80% 31.40%
Zemanta 70.40% 64.30% 48.10% 29.30%
F1 values
5
ELIS – Multimedia Lab
MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts
Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle
Making Sense of Micropost Workshop @ World Wide Web Conference 2013
How do current NER tools perform? (3)
AlchemyAPI: performs bad in recognizing exotic names,
small villages, buildings and organizations
Zemanta: same as AlchemyAPI + relies on capitalisation
OpenCalais: bad in recognizing small villages, buildings and
organizations. Does recognize big events!
DBpedia Spotlight: returns multiple ‘possible’ entities
What if we combine the power of all 4 services?
6
ELIS – Multimedia Lab
MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts
Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle
Making Sense of Micropost Workshop @ World Wide Web Conference 2013
Combining existing services (1)
Apply machine learning on a feature vector of the output
of the different services
AlchemyAPI DBpedia Spotlight OpenCalais Zemanta
Random Forest
Confidence level
PER, LOC, ORG, MISC
Service specific entity
16 features
PER, LOC, ORG, MISC
7
ELIS – Multimedia Lab
MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts
Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle
Making Sense of Micropost Workshop @ World Wide Web Conference 2013
Combining existing services (2)
Evaluation on entity type
PER LOC ORG MISC
Spotlight (0.2) 82.20% 75.70% 60.40% 47.40%
Spotlight (0.5) 81.60% 74.30% 59.40% 40.50%
Noisy input data gives better results
(final results on test set are not included and are part of the challenge)
8
ELIS – Multimedia Lab
MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts
Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle
Making Sense of Micropost Workshop @ World Wide Web Conference 2013
Conclusions
Current NER tools do perform well in most cases
Shortcomings: Incorrect use of capital lettres
Abbreviations of organisations
Small villages, counties and buildings
Combining the output of several services yields good results
9
ELIS – Multimedia Lab
MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts
Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle
Making Sense of Micropost Workshop @ World Wide Web Conference 2013
#Questions @frederic_godin #MMLab

More Related Content

Viewers also liked

julio
 julio julio
junio
 junio junio
4. abril
4. abril4. abril
4. abril
Virginia Paguay
 
MAYO
MAYOMAYO
septiembre
 septiembre septiembre
septiembre
Virginia Paguay
 
octubre
octubreoctubre
3. marzo
3. marzo3. marzo
3. marzo
Virginia Paguay
 
11. noviembre
11. noviembre11. noviembre
11. noviembre
Virginia Paguay
 
Named Entity Recognition for Twitter Microposts (only) using Distributed Word...
Named Entity Recognition for Twitter Microposts (only) using Distributed Word...Named Entity Recognition for Twitter Microposts (only) using Distributed Word...
Named Entity Recognition for Twitter Microposts (only) using Distributed Word...
fgodin
 

Viewers also liked (9)

julio
 julio julio
julio
 
junio
 junio junio
junio
 
4. abril
4. abril4. abril
4. abril
 
MAYO
MAYOMAYO
MAYO
 
septiembre
 septiembre septiembre
septiembre
 
octubre
octubreoctubre
octubre
 
3. marzo
3. marzo3. marzo
3. marzo
 
11. noviembre
11. noviembre11. noviembre
11. noviembre
 
Named Entity Recognition for Twitter Microposts (only) using Distributed Word...
Named Entity Recognition for Twitter Microposts (only) using Distributed Word...Named Entity Recognition for Twitter Microposts (only) using Distributed Word...
Named Entity Recognition for Twitter Microposts (only) using Distributed Word...
 

Similar to Msm2013challenge

DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
Symeon Papadopoulos
 
34th.余凯.机器学习进展及语音图像中的应用
34th.余凯.机器学习进展及语音图像中的应用34th.余凯.机器学习进展及语音图像中的应用
34th.余凯.机器学习进展及语音图像中的应用
komunling
 
Yu Yuan (IEEE Standards Association): The Road to the Ultimate VR/AR - Transf...
Yu Yuan (IEEE Standards Association): The Road to the Ultimate VR/AR - Transf...Yu Yuan (IEEE Standards Association): The Road to the Ultimate VR/AR - Transf...
Yu Yuan (IEEE Standards Association): The Road to the Ultimate VR/AR - Transf...
AugmentedWorldExpo
 
From Representation to Mediation: A New Agenda for Conceptual Modeling Resear...
From Representation to Mediation: A New Agenda for Conceptual Modeling Resear...From Representation to Mediation: A New Agenda for Conceptual Modeling Resear...
From Representation to Mediation: A New Agenda for Conceptual Modeling Resear...
Jan Recker @ University of Hamburg
 
Identification keys
Identification keysIdentification keys
Identification keysvbrant
 
BEST IMAGE PROCESSING TOOLS TO EXPECT in 2023 – Tutors India
BEST IMAGE PROCESSING TOOLS TO EXPECT in 2023 – Tutors IndiaBEST IMAGE PROCESSING TOOLS TO EXPECT in 2023 – Tutors India
BEST IMAGE PROCESSING TOOLS TO EXPECT in 2023 – Tutors India
Tutors India
 
« The Microsoft Research Lab of Things » : from IoT research and prototyping,...
« The Microsoft Research Lab of Things » : from IoT research and prototyping,...« The Microsoft Research Lab of Things » : from IoT research and prototyping,...
« The Microsoft Research Lab of Things » : from IoT research and prototyping,...
Microsoft
 
Visualization for Software Analytics
Visualization for Software AnalyticsVisualization for Software Analytics
Visualization for Software Analytics
Margaret-Anne Storey
 
DEEPFAKE DETECTION TECHNIQUES: A REVIEW
DEEPFAKE DETECTION TECHNIQUES: A REVIEWDEEPFAKE DETECTION TECHNIQUES: A REVIEW
DEEPFAKE DETECTION TECHNIQUES: A REVIEW
vivatechijri
 
Video AI for Media and Entertainment Industry
Video AI for Media and Entertainment IndustryVideo AI for Media and Entertainment Industry
Video AI for Media and Entertainment Industry
Albert Y. C. Chen
 
Exponentials and Networks
Exponentials and NetworksExponentials and Networks
Exponentials and Networks
David Orban
 
research Paper face recognition attendance system
research Paper face recognition attendance systemresearch Paper face recognition attendance system
research Paper face recognition attendance system
AnkitRao82
 
Open Cv – An Introduction To The Vision
Open Cv – An Introduction To The VisionOpen Cv – An Introduction To The Vision
Open Cv – An Introduction To The Vision
Hemanth Haridas
 
No specimen (software) left behind
No specimen (software) left behindNo specimen (software) left behind
No specimen (software) left behind
Vince Smith
 
A Smart Assistance for Visually Impaired
A Smart Assistance for Visually ImpairedA Smart Assistance for Visually Impaired
A Smart Assistance for Visually Impaired
IRJET Journal
 
ECIR 2013 Keynote - Time for Events
ECIR 2013 Keynote - Time for EventsECIR 2013 Keynote - Time for Events
ECIR 2013 Keynote - Time for Eventsmor
 
myExperiment - Defining the Social Virtual Research Environment
myExperiment - Defining the Social Virtual Research EnvironmentmyExperiment - Defining the Social Virtual Research Environment
myExperiment - Defining the Social Virtual Research Environment
David De Roure
 
Faculty, Visuals, and Values: Shaping a Learning Technology Ecosystem
Faculty, Visuals, and Values: Shaping a Learning Technology EcosystemFaculty, Visuals, and Values: Shaping a Learning Technology Ecosystem
Faculty, Visuals, and Values: Shaping a Learning Technology Ecosystem
Michael Greene
 
GDSC MMCOE - ML Campaign
GDSC MMCOE - ML CampaignGDSC MMCOE - ML Campaign
GDSC MMCOE - ML Campaign
Lavesh Akhadkar
 
MetatonZ: Troyer Patents Elevator uptodate 60614
MetatonZ: Troyer Patents Elevator uptodate 60614MetatonZ: Troyer Patents Elevator uptodate 60614
MetatonZ: Troyer Patents Elevator uptodate 60614
Diane Troyer
 

Similar to Msm2013challenge (20)

DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
 
34th.余凯.机器学习进展及语音图像中的应用
34th.余凯.机器学习进展及语音图像中的应用34th.余凯.机器学习进展及语音图像中的应用
34th.余凯.机器学习进展及语音图像中的应用
 
Yu Yuan (IEEE Standards Association): The Road to the Ultimate VR/AR - Transf...
Yu Yuan (IEEE Standards Association): The Road to the Ultimate VR/AR - Transf...Yu Yuan (IEEE Standards Association): The Road to the Ultimate VR/AR - Transf...
Yu Yuan (IEEE Standards Association): The Road to the Ultimate VR/AR - Transf...
 
From Representation to Mediation: A New Agenda for Conceptual Modeling Resear...
From Representation to Mediation: A New Agenda for Conceptual Modeling Resear...From Representation to Mediation: A New Agenda for Conceptual Modeling Resear...
From Representation to Mediation: A New Agenda for Conceptual Modeling Resear...
 
Identification keys
Identification keysIdentification keys
Identification keys
 
BEST IMAGE PROCESSING TOOLS TO EXPECT in 2023 – Tutors India
BEST IMAGE PROCESSING TOOLS TO EXPECT in 2023 – Tutors IndiaBEST IMAGE PROCESSING TOOLS TO EXPECT in 2023 – Tutors India
BEST IMAGE PROCESSING TOOLS TO EXPECT in 2023 – Tutors India
 
« The Microsoft Research Lab of Things » : from IoT research and prototyping,...
« The Microsoft Research Lab of Things » : from IoT research and prototyping,...« The Microsoft Research Lab of Things » : from IoT research and prototyping,...
« The Microsoft Research Lab of Things » : from IoT research and prototyping,...
 
Visualization for Software Analytics
Visualization for Software AnalyticsVisualization for Software Analytics
Visualization for Software Analytics
 
DEEPFAKE DETECTION TECHNIQUES: A REVIEW
DEEPFAKE DETECTION TECHNIQUES: A REVIEWDEEPFAKE DETECTION TECHNIQUES: A REVIEW
DEEPFAKE DETECTION TECHNIQUES: A REVIEW
 
Video AI for Media and Entertainment Industry
Video AI for Media and Entertainment IndustryVideo AI for Media and Entertainment Industry
Video AI for Media and Entertainment Industry
 
Exponentials and Networks
Exponentials and NetworksExponentials and Networks
Exponentials and Networks
 
research Paper face recognition attendance system
research Paper face recognition attendance systemresearch Paper face recognition attendance system
research Paper face recognition attendance system
 
Open Cv – An Introduction To The Vision
Open Cv – An Introduction To The VisionOpen Cv – An Introduction To The Vision
Open Cv – An Introduction To The Vision
 
No specimen (software) left behind
No specimen (software) left behindNo specimen (software) left behind
No specimen (software) left behind
 
A Smart Assistance for Visually Impaired
A Smart Assistance for Visually ImpairedA Smart Assistance for Visually Impaired
A Smart Assistance for Visually Impaired
 
ECIR 2013 Keynote - Time for Events
ECIR 2013 Keynote - Time for EventsECIR 2013 Keynote - Time for Events
ECIR 2013 Keynote - Time for Events
 
myExperiment - Defining the Social Virtual Research Environment
myExperiment - Defining the Social Virtual Research EnvironmentmyExperiment - Defining the Social Virtual Research Environment
myExperiment - Defining the Social Virtual Research Environment
 
Faculty, Visuals, and Values: Shaping a Learning Technology Ecosystem
Faculty, Visuals, and Values: Shaping a Learning Technology EcosystemFaculty, Visuals, and Values: Shaping a Learning Technology Ecosystem
Faculty, Visuals, and Values: Shaping a Learning Technology Ecosystem
 
GDSC MMCOE - ML Campaign
GDSC MMCOE - ML CampaignGDSC MMCOE - ML Campaign
GDSC MMCOE - ML Campaign
 
MetatonZ: Troyer Patents Elevator uptodate 60614
MetatonZ: Troyer Patents Elevator uptodate 60614MetatonZ: Troyer Patents Elevator uptodate 60614
MetatonZ: Troyer Patents Elevator uptodate 60614
 

Recently uploaded

The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 

Recently uploaded (20)

The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 

Msm2013challenge

  • 1. ELIS – Multimedia Lab Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts Multimedia Lab, Ghent University – iMinds, Belgium Image and Video Systems Lab, KAIST, South Korea
  • 2. 2 ELIS – Multimedia Lab MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle Making Sense of Micropost Workshop @ World Wide Web Conference 2013 Introduction: The challenge Existing tools for NER are developed for news corpera Develop NER tools for microposts 4 entity types: Person Location Organisation Miscellaneous (film/movie, entertainment award event, political event, programming language, sporting event and TV show)
  • 3. 3 ELIS – Multimedia Lab MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle Making Sense of Micropost Workshop @ World Wide Web Conference 2013 How do current NER tools perform? (1) Rizzo et al. evaluated the performance of: AlchemyAPI, DBpedia Spotlight, Evri, Extractiv, OpenCalais and Zemanta On: 5 TED talks, 1000 news articles, and 217 conference abstracts. Could we do the same evaluation for microposts?
  • 4. 4 ELIS – Multimedia Lab MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle Making Sense of Micropost Workshop @ World Wide Web Conference 2013 How do current NER tools perform? (2) Preprocessing: convert bracket tokens to brackets Note: values can differ based on ontology mapping used! PER LOC ORG MISC AlchemyAPI 78.20% 74.60% 54.40% 10.20% Spotlight (0.2) 57.60% 46.40% 24.40% 5.00% Spotlight (0.5) 32.90% 3.70% 6.50% 7.30% OpenCalais 69.30% 73.10% 55.80% 31.40% Zemanta 70.40% 64.30% 48.10% 29.30% F1 values
  • 5. 5 ELIS – Multimedia Lab MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle Making Sense of Micropost Workshop @ World Wide Web Conference 2013 How do current NER tools perform? (3) AlchemyAPI: performs bad in recognizing exotic names, small villages, buildings and organizations Zemanta: same as AlchemyAPI + relies on capitalisation OpenCalais: bad in recognizing small villages, buildings and organizations. Does recognize big events! DBpedia Spotlight: returns multiple ‘possible’ entities What if we combine the power of all 4 services?
  • 6. 6 ELIS – Multimedia Lab MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle Making Sense of Micropost Workshop @ World Wide Web Conference 2013 Combining existing services (1) Apply machine learning on a feature vector of the output of the different services AlchemyAPI DBpedia Spotlight OpenCalais Zemanta Random Forest Confidence level PER, LOC, ORG, MISC Service specific entity 16 features PER, LOC, ORG, MISC
  • 7. 7 ELIS – Multimedia Lab MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle Making Sense of Micropost Workshop @ World Wide Web Conference 2013 Combining existing services (2) Evaluation on entity type PER LOC ORG MISC Spotlight (0.2) 82.20% 75.70% 60.40% 47.40% Spotlight (0.5) 81.60% 74.30% 59.40% 40.50% Noisy input data gives better results (final results on test set are not included and are part of the challenge)
  • 8. 8 ELIS – Multimedia Lab MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle Making Sense of Micropost Workshop @ World Wide Web Conference 2013 Conclusions Current NER tools do perform well in most cases Shortcomings: Incorrect use of capital lettres Abbreviations of organisations Small villages, counties and buildings Combining the output of several services yields good results
  • 9. 9 ELIS – Multimedia Lab MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in Microposts Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle Making Sense of Micropost Workshop @ World Wide Web Conference 2013 #Questions @frederic_godin #MMLab