SlideShare a Scribd company logo
NERD: Evaluating Named Entity
Recognition Tools in the Web of Data

     Giuseppe Rizzo <giuseppe.rizzo@eurecom.fr>
     Raphaël Troncy <raphael.troncy@eurecom.fr>
What is a Named Entity recognition task?

A task that aims to locate and classify the name of a person or an
organization, a location, a brand, a product, a numeric expression
including time, date, money and percent in a textual document




24 October 2011   Workshop on Web Scale Knowledge Extraction (WEKEX'11)   - 2/21
Named Entity recognition tools




24 October 2011   Workshop on Web Scale Knowledge Extraction (WEKEX'11)   - 3/21
Differences among those NER extractors

  
      Granularity
          
              extract NE from sentences vs from the entire document


  
      Technologies used
          
            algorithms used to extract NE
          
            supported languages
          
            taxonomy of type of NE recognized
          
            disambiguation (dataset used to provide links)
          
            content request size
          
            Response format



24 October 2011      Workshop on Web Scale Knowledge Extraction (WEKEX'11)   - 4/21
And ...




                  
                    What about precision and recall?
                  
                    Which extractor best fits my needs?



24 October 2011       Workshop on Web Scale Knowledge Extraction (WEKEX'11)   - 5/21
Seeks to find pros and cons of
                   those extractors


                                        What is NERD?
                                   REST API1                                     ontology3
                                                                      UI2
1
  http://nerd.eurecom.fr/api/application.wadl
2
  http://nerd.eurecom.fr/
3
  http://nerd.eurecom.fr/ontology


    24 October 2011          Workshop on Web Scale Knowledge Extraction (WEKEX'11)    - 6/21
Showcase




                  http://nerd.eurecom.fr


       Science: "Google Cars Drive Themselves", http://bit.ly/oTj8md (part
       of the original resource found at http://nyti.ms/9p19i8)




24 October 2011       Workshop on Web Scale Knowledge Extraction (WEKEX'11)   - 7/21
Evaluation

 5 extractors using default configurations
 
     Controlled experiment
            
              4 human raters
            
              10 English news articles (5 from BBC and 5 from The New York Times)
            
              each rater evaluated each article for all the extractors
            
              200 evaluations in total

 
     Uncontrolled experiment
            
              17 human raters
            
              53 English news articles (sources: CNN, BBC, The New York Times and
            Yahoo! News)
            
              free selection of articles


                     Each human rater received a training1

     1
         http://nerd.eurecom.fr/help

24 October 2011             Workshop on Web Scale Knowledge Extraction (WEKEX'11)   - 8/21
Evaluation output



                           t = (NE, type, URI, relevant)


  The assessment consists in rating these criteria with a Boolean value


  If no type or no disambiguation URI is provided by the extractor, it is
  considered false by default




24 October 2011   Workshop on Web Scale Knowledge Extraction (WEKEX'11)   - 9/21
Controlled experiment - dataset1

     Categories: World, Business, Sport, Science, Health

     1 BBC article and 1 NYT article for each category

     Average word number per article: 981

     The final number of unique entities detected is 4641 with an average
     number of named entity per article equal to 23.2

     Some of the extractors (e.g. DBpedia Spotlight and Extractiv) provide NE
     duplicates. We removed all duplicates do not bias the statistics




1
    http://nerd.eurecom.fr/ui/evaluation/wekex2011-goldenset.tar.gz


    24 October 2011           Workshop on Web Scale Knowledge Extraction (WEKEX'11)   - 10/21
Controlled experiment – agreement score
  Fleiss's kappa score1

  Grouped by
  extractor




  Grouped by
  source




  Grouped by
  category



  1
   Joseph L. Fleiss. Measuring nominal scale agreement among many raters. Psychological
  Bulletin, 76(5):378–382, 1971

24 October 2011    Workshop on Web Scale Knowledge Extraction (WEKEX'11)   - 11/21
Controlled experiment – statistic result

  Overall
  statistics



  Grouped by
  extractor


                                                                                      different behavior
                                                                                    for different sources

 Grouped by
 category




24 October 2011   Workshop on Web Scale Knowledge Extraction (WEKEX'11)   - 12/21
Uncontrolled experiment - dataset

 17 raters were free to select English news articles from CNN, BBC,
 The New York Times and Yahoo! News

 53 news articles selected

 Total number of assessments = 94 and the assessment average number
 per user = 5.2

 Each article assessed at least by 2 different tools

 The final number of unique entities detected is 1616 with an average
 number of named entity per article equal to 34

 Some of the extractors (e.g. DBpedia Spotlight and Extractiv) provide NE
 duplicates. In order do not bias the statistics, we removed all duplicates


24 October 2011   Workshop on Web Scale Knowledge Extraction (WEKEX'11)   - 13/21
Uncontrolled experiment – statistic result (I)


Overall
precision




Grouped by
extractors




24 October 2011   Workshop on Web Scale Knowledge Extraction (WEKEX'11)   - 14/21
Uncontrolled experiment – statistic result (II)

    Grouped by
    category




24 October 2011   Workshop on Web Scale Knowledge Extraction (WEKEX'11)   - 15/21
Q. Which are the best NER tools ?
Conclusion                                                           A. They are ...


  AlchemyAPI has obtained the best results in NE extraction and
  categorization

  DBpedia Spotlight and Zemanta showed ability to disambiguate NE in the
  LOD cloud

  Experiments across categories of articles did not show significant
  differences in the analysis.


  Published the WEKEX'11 ground-truth
  http://nerd.eurecom.fr/ui/evaluation/wekex2011-goldenset.tar.gz




24 October 2011   Workshop on Web Scale Knowledge Extraction (WEKEX'11)      - 16/21
Future Work (NERD Timeline)

                  beginning                     core application


                                                uncontrolled experiment


                                                controlled experiment


                    today                       REST API, release WEKEX'11 ground-truth


                                                release ISWC'11 ground truth


                                                 NERD “smart” service: combining the best of all NER
                                                 tools




24 October 2011             Workshop on Web Scale Knowledge Extraction (WEKEX'11)   - 17/21
ISWC'11 golden-set



                                      Do you believe it's easy to find
                                     an agreement among all raters?



     We'd like inviting to create a new golden-set during the
     ISWC'2011 poster and demo session. We will kindly ask
     each rater to evaluate two short parts of two English news
     articles with all extractors supported by NERD


24 October 2011   Workshop on Web Scale Knowledge Extraction (WEKEX'11)   - 18/21
Thanks for your time and your attention




                      http://nerd.eurecom.fr

                           @giusepperizzo @rtroncy #nerd



                             http://www.slideshare.net/giusepperizzo



24 October 2011   Workshop on Web Scale Knowledge Extraction (WEKEX'11)   - 19/21
Fleiss ' Kappa




                         chance agreement



     K = 1 fully agreement among all raters
     K = 0 (or lesser than) poor agreement




24 October 2011    Workshop on Web Scale Knowledge Extraction (WEKEX'11)   - 20/21
Fleiss ' kappa Interpretation



                            Kappa                                          Interpretation
                            <0                                            Poor agreement
                        0.01 – 0.20                                       Slight agreement
                        0.21 – 0.40                                       Fair agreement
                        0.41 – 0.60                                Moderate agreement
                        0.61 – 0.80                               Substantial agreement
                        0.81 – 1.00                            Almost perfect agreement




24 October 2011   Workshop on Web Scale Knowledge Extraction (WEKEX'11)           - 21/21

More Related Content

Viewers also liked

Introduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic AnalysisIntroduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic Analysis
NYC Predictive Analytics
 
Syntactic Analysis
Syntactic AnalysisSyntactic Analysis
Syntactic Analysis
Aleli Lac
 
3 Des Nas
3 Des Nas3 Des Nas
3 Des Nas
epaper
 
Ken Lawrence Minipaper Georges Melies
Ken Lawrence Minipaper Georges MeliesKen Lawrence Minipaper Georges Melies
Ken Lawrence Minipaper Georges Melies
Thisco
 
Binder20aceh
Binder20acehBinder20aceh
Binder20aceh
epaper
 
Mike Brunsberg Technology Power Point
Mike Brunsberg   Technology Power PointMike Brunsberg   Technology Power Point
Mike Brunsberg Technology Power Point
mbrunsberg
 
Edisi1 Okt
Edisi1 OktEdisi1 Okt
Edisi1 Okt
epaper
 
Binder09 Aceh
Binder09 AcehBinder09 Aceh
Binder09 Aceh
epaper
 

Viewers also liked (20)

DBpedia Spotlight at I-SEMANTICS 2011
DBpedia Spotlight at I-SEMANTICS 2011DBpedia Spotlight at I-SEMANTICS 2011
DBpedia Spotlight at I-SEMANTICS 2011
 
GoogLeNet Insights
GoogLeNet InsightsGoogLeNet Insights
GoogLeNet Insights
 
Ao artìculo
Ao artìculoAo artìculo
Ao artìculo
 
Entity Linking
Entity LinkingEntity Linking
Entity Linking
 
Graph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media AnalyticsGraph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media Analytics
 
How to use Latent Semantic Analysis to Glean Real Insight - Franco Amalfi
How to use Latent Semantic Analysis to Glean Real Insight - Franco AmalfiHow to use Latent Semantic Analysis to Glean Real Insight - Franco Amalfi
How to use Latent Semantic Analysis to Glean Real Insight - Franco Amalfi
 
Latent Semanctic Analysis Auro Tripathy
Latent Semanctic Analysis Auro TripathyLatent Semanctic Analysis Auro Tripathy
Latent Semanctic Analysis Auro Tripathy
 
Latent Semantic Analysis of Wikipedia with Spark
Latent Semantic Analysis of Wikipedia with SparkLatent Semantic Analysis of Wikipedia with Spark
Latent Semantic Analysis of Wikipedia with Spark
 
Introduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic AnalysisIntroduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic Analysis
 
Syntactic Analysis
Syntactic AnalysisSyntactic Analysis
Syntactic Analysis
 
Semantics: Seven types of meaning
Semantics: Seven types of meaningSemantics: Seven types of meaning
Semantics: Seven types of meaning
 
Semantics
SemanticsSemantics
Semantics
 
3 Des Nas
3 Des Nas3 Des Nas
3 Des Nas
 
Ken Lawrence Minipaper Georges Melies
Ken Lawrence Minipaper Georges MeliesKen Lawrence Minipaper Georges Melies
Ken Lawrence Minipaper Georges Melies
 
Binder20aceh
Binder20acehBinder20aceh
Binder20aceh
 
Mike Brunsberg Technology Power Point
Mike Brunsberg   Technology Power PointMike Brunsberg   Technology Power Point
Mike Brunsberg Technology Power Point
 
Branding Philosophy
Branding PhilosophyBranding Philosophy
Branding Philosophy
 
Edisi1 Okt
Edisi1 OktEdisi1 Okt
Edisi1 Okt
 
Binder09 Aceh
Binder09 AcehBinder09 Aceh
Binder09 Aceh
 
6 weird tourist spots in the us
6 weird tourist spots in the us 6 weird tourist spots in the us
6 weird tourist spots in the us
 

Similar to NERD: Evaluating Named Entity Recognition Tools in the Web of Data

10-10-06-02 Paul Copioli VEX
10-10-06-02 Paul Copioli VEX10-10-06-02 Paul Copioli VEX
10-10-06-02 Paul Copioli VEX
Darrell Caron
 
Linked Open Data : opportunités et défis par Makx Dekkers
Linked Open Data : opportunités et défis par Makx DekkersLinked Open Data : opportunités et défis par Makx Dekkers
Linked Open Data : opportunités et défis par Makx Dekkers
ABES
 
Text Retrieval Conferences (TREC)
Text Retrieval Conferences (TREC)Text Retrieval Conferences (TREC)
Text Retrieval Conferences (TREC)
Abdul Gaffar
 
Enabling DevOps for IoT software development, powered by Open Source, OW2onli...
Enabling DevOps for IoT software development, powered by Open Source, OW2onli...Enabling DevOps for IoT software development, powered by Open Source, OW2onli...
Enabling DevOps for IoT software development, powered by Open Source, OW2onli...
OW2
 
JavaOne 2011 Progressive JavaFX 2.0 Custom Components
JavaOne 2011 Progressive JavaFX 2.0 Custom ComponentsJavaOne 2011 Progressive JavaFX 2.0 Custom Components
JavaOne 2011 Progressive JavaFX 2.0 Custom Components
Peter Pilgrim
 
DSD-INT 2014 - OpenMI symposium - OpenMI and other model coupling standards, ...
DSD-INT 2014 - OpenMI symposium - OpenMI and other model coupling standards, ...DSD-INT 2014 - OpenMI symposium - OpenMI and other model coupling standards, ...
DSD-INT 2014 - OpenMI symposium - OpenMI and other model coupling standards, ...
Deltares
 

Similar to NERD: Evaluating Named Entity Recognition Tools in the Web of Data (16)

10-10-06-02 Paul Copioli VEX
10-10-06-02 Paul Copioli VEX10-10-06-02 Paul Copioli VEX
10-10-06-02 Paul Copioli VEX
 
"Virtual" VREs - bringing research into the curriculum
"Virtual" VREs - bringing research into the curriculum"Virtual" VREs - bringing research into the curriculum
"Virtual" VREs - bringing research into the curriculum
 
Unified Systems Engeneering with GoedelWorks
Unified Systems Engeneering with GoedelWorksUnified Systems Engeneering with GoedelWorks
Unified Systems Engeneering with GoedelWorks
 
OpenNebulaConf2015 1.17 What’s Going on in Xen - Roger Pau Monné
OpenNebulaConf2015 1.17 What’s Going on in Xen - Roger Pau MonnéOpenNebulaConf2015 1.17 What’s Going on in Xen - Roger Pau Monné
OpenNebulaConf2015 1.17 What’s Going on in Xen - Roger Pau Monné
 
Training di Base Neo4j
Training di Base Neo4jTraining di Base Neo4j
Training di Base Neo4j
 
Linked Open Data : opportunités et défis par Makx Dekkers
Linked Open Data : opportunités et défis par Makx DekkersLinked Open Data : opportunités et défis par Makx Dekkers
Linked Open Data : opportunités et défis par Makx Dekkers
 
IMPACT Final Conference - NCSR - Wordspotting
IMPACT Final Conference - NCSR - WordspottingIMPACT Final Conference - NCSR - Wordspotting
IMPACT Final Conference - NCSR - Wordspotting
 
EOSC Architecture Session - EOSC Stakeholders Forum 2018
EOSC Architecture Session - EOSC Stakeholders Forum 2018EOSC Architecture Session - EOSC Stakeholders Forum 2018
EOSC Architecture Session - EOSC Stakeholders Forum 2018
 
Text Retrieval Conferences (TREC)
Text Retrieval Conferences (TREC)Text Retrieval Conferences (TREC)
Text Retrieval Conferences (TREC)
 
oai-2.0-adv.ppt
oai-2.0-adv.pptoai-2.0-adv.ppt
oai-2.0-adv.ppt
 
01.19.2011 AIIT InfoTalk on OpenStack
01.19.2011 AIIT InfoTalk on OpenStack01.19.2011 AIIT InfoTalk on OpenStack
01.19.2011 AIIT InfoTalk on OpenStack
 
Enabling DevOps for IoT software development, powered by Open Source, OW2onli...
Enabling DevOps for IoT software development, powered by Open Source, OW2onli...Enabling DevOps for IoT software development, powered by Open Source, OW2onli...
Enabling DevOps for IoT software development, powered by Open Source, OW2onli...
 
LOD2: State of Play WP1: Requirements, Design & LOD2 Stack Prototype
LOD2: State of Play WP1: Requirements, Design & LOD2 Stack PrototypeLOD2: State of Play WP1: Requirements, Design & LOD2 Stack Prototype
LOD2: State of Play WP1: Requirements, Design & LOD2 Stack Prototype
 
JavaOne 2011 Progressive JavaFX 2.0 Custom Components
JavaOne 2011 Progressive JavaFX 2.0 Custom ComponentsJavaOne 2011 Progressive JavaFX 2.0 Custom Components
JavaOne 2011 Progressive JavaFX 2.0 Custom Components
 
SelEQ
SelEQSelEQ
SelEQ
 
DSD-INT 2014 - OpenMI symposium - OpenMI and other model coupling standards, ...
DSD-INT 2014 - OpenMI symposium - OpenMI and other model coupling standards, ...DSD-INT 2014 - OpenMI symposium - OpenMI and other model coupling standards, ...
DSD-INT 2014 - OpenMI symposium - OpenMI and other model coupling standards, ...
 

More from Giuseppe Rizzo

More from Giuseppe Rizzo (20)

Artificial intelligence for social good
Artificial intelligence for social goodArtificial intelligence for social good
Artificial intelligence for social good
 
AI in 60 minutes
AI in 60 minutesAI in 60 minutes
AI in 60 minutes
 
COMPRENDE, PERSONALIZZA, INTERAGISCE E IMPARA: L’AI COGNITIVA PER L’HR
COMPRENDE, PERSONALIZZA, INTERAGISCE E  IMPARA: L’AI COGNITIVA PER L’HRCOMPRENDE, PERSONALIZZA, INTERAGISCE E  IMPARA: L’AI COGNITIVA PER L’HR
COMPRENDE, PERSONALIZZA, INTERAGISCE E IMPARA: L’AI COGNITIVA PER L’HR
 
Understand, Answer and Argument: Conversational Agents
Understand, Answer and Argument: Conversational AgentsUnderstand, Answer and Argument: Conversational Agents
Understand, Answer and Argument: Conversational Agents
 
AI For Profiling Your Customers
AI For Profiling Your CustomersAI For Profiling Your Customers
AI For Profiling Your Customers
 
AI for Personalized Chatbot
AI for Personalized ChatbotAI for Personalized Chatbot
AI for Personalized Chatbot
 
Tourist Knowledge Graph Creation to Automating Travel Bookings
Tourist Knowledge Graph Creation to Automating Travel BookingsTourist Knowledge Graph Creation to Automating Travel Bookings
Tourist Knowledge Graph Creation to Automating Travel Bookings
 
The SentiME System at the SSA Challenge Task 1
The SentiME System at the SSA Challenge Task 1The SentiME System at the SSA Challenge Task 1
The SentiME System at the SSA Challenge Task 1
 
Context-Enhanced Adaptive Entity Linking
Context-Enhanced Adaptive Entity LinkingContext-Enhanced Adaptive Entity Linking
Context-Enhanced Adaptive Entity Linking
 
From Data to Knowledge for Tourists
From Data to Knowledge for TouristsFrom Data to Knowledge for Tourists
From Data to Knowledge for Tourists
 
Enabling Visitors to Explore a Smart City
Enabling Visitors to Explore a Smart CityEnabling Visitors to Explore a Smart City
Enabling Visitors to Explore a Smart City
 
NEEL2015 challenge summary
NEEL2015 challenge summaryNEEL2015 challenge summary
NEEL2015 challenge summary
 
Inductive Entity Typing Alignment
Inductive Entity Typing AlignmentInductive Entity Typing Alignment
Inductive Entity Typing Alignment
 
Benchmarking the Extraction and Disambiguation of Named Entities on the Seman...
Benchmarking the Extraction and Disambiguation of Named Entities on the Seman...Benchmarking the Extraction and Disambiguation of Named Entities on the Seman...
Benchmarking the Extraction and Disambiguation of Named Entities on the Seman...
 
CrossLanguageSpotter: A Library for Detecting Relations in Polyglot Frameworks
CrossLanguageSpotter: A Library for Detecting Relations in Polyglot FrameworksCrossLanguageSpotter: A Library for Detecting Relations in Polyglot Frameworks
CrossLanguageSpotter: A Library for Detecting Relations in Polyglot Frameworks
 
Learning with the Web. Structuring data to ease machine understanding
Learning with the Web. Structuring data to ease  machine understandingLearning with the Web. Structuring data to ease  machine understanding
Learning with the Web. Structuring data to ease machine understanding
 
Learning with the Web: Spotting Named Entities on the intersection of NERD an...
Learning with the Web: Spotting Named Entities on the intersection of NERD an...Learning with the Web: Spotting Named Entities on the intersection of NERD an...
Learning with the Web: Spotting Named Entities on the intersection of NERD an...
 
NERD meets NIF: Lifting NLP Extraction Results to the Linked Data Cloud
NERD meets NIF:  Lifting NLP Extraction Results to the Linked Data CloudNERD meets NIF:  Lifting NLP Extraction Results to the Linked Data Cloud
NERD meets NIF: Lifting NLP Extraction Results to the Linked Data Cloud
 
The NERD project
The NERD projectThe NERD project
The NERD project
 
L'enorme archivio di dati: il Web
L'enorme archivio di dati: il WebL'enorme archivio di dati: il Web
L'enorme archivio di dati: il Web
 

Recently uploaded

Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 

Recently uploaded (20)

IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG Evaluation
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 

NERD: Evaluating Named Entity Recognition Tools in the Web of Data

  • 1. NERD: Evaluating Named Entity Recognition Tools in the Web of Data Giuseppe Rizzo <giuseppe.rizzo@eurecom.fr> Raphaël Troncy <raphael.troncy@eurecom.fr>
  • 2. What is a Named Entity recognition task? A task that aims to locate and classify the name of a person or an organization, a location, a brand, a product, a numeric expression including time, date, money and percent in a textual document 24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 2/21
  • 3. Named Entity recognition tools 24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 3/21
  • 4. Differences among those NER extractors  Granularity  extract NE from sentences vs from the entire document  Technologies used  algorithms used to extract NE  supported languages  taxonomy of type of NE recognized  disambiguation (dataset used to provide links)  content request size  Response format 24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 4/21
  • 5. And ...  What about precision and recall?  Which extractor best fits my needs? 24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 5/21
  • 6. Seeks to find pros and cons of those extractors What is NERD? REST API1 ontology3 UI2 1 http://nerd.eurecom.fr/api/application.wadl 2 http://nerd.eurecom.fr/ 3 http://nerd.eurecom.fr/ontology 24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 6/21
  • 7. Showcase http://nerd.eurecom.fr Science: "Google Cars Drive Themselves", http://bit.ly/oTj8md (part of the original resource found at http://nyti.ms/9p19i8) 24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 7/21
  • 8. Evaluation 5 extractors using default configurations  Controlled experiment  4 human raters  10 English news articles (5 from BBC and 5 from The New York Times)  each rater evaluated each article for all the extractors  200 evaluations in total  Uncontrolled experiment  17 human raters  53 English news articles (sources: CNN, BBC, The New York Times and Yahoo! News)  free selection of articles Each human rater received a training1 1 http://nerd.eurecom.fr/help 24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 8/21
  • 9. Evaluation output t = (NE, type, URI, relevant) The assessment consists in rating these criteria with a Boolean value If no type or no disambiguation URI is provided by the extractor, it is considered false by default 24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 9/21
  • 10. Controlled experiment - dataset1 Categories: World, Business, Sport, Science, Health 1 BBC article and 1 NYT article for each category Average word number per article: 981 The final number of unique entities detected is 4641 with an average number of named entity per article equal to 23.2 Some of the extractors (e.g. DBpedia Spotlight and Extractiv) provide NE duplicates. We removed all duplicates do not bias the statistics 1 http://nerd.eurecom.fr/ui/evaluation/wekex2011-goldenset.tar.gz 24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 10/21
  • 11. Controlled experiment – agreement score Fleiss's kappa score1 Grouped by extractor Grouped by source Grouped by category 1 Joseph L. Fleiss. Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5):378–382, 1971 24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 11/21
  • 12. Controlled experiment – statistic result Overall statistics Grouped by extractor different behavior for different sources Grouped by category 24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 12/21
  • 13. Uncontrolled experiment - dataset 17 raters were free to select English news articles from CNN, BBC, The New York Times and Yahoo! News 53 news articles selected Total number of assessments = 94 and the assessment average number per user = 5.2 Each article assessed at least by 2 different tools The final number of unique entities detected is 1616 with an average number of named entity per article equal to 34 Some of the extractors (e.g. DBpedia Spotlight and Extractiv) provide NE duplicates. In order do not bias the statistics, we removed all duplicates 24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 13/21
  • 14. Uncontrolled experiment – statistic result (I) Overall precision Grouped by extractors 24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 14/21
  • 15. Uncontrolled experiment – statistic result (II) Grouped by category 24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 15/21
  • 16. Q. Which are the best NER tools ? Conclusion A. They are ... AlchemyAPI has obtained the best results in NE extraction and categorization DBpedia Spotlight and Zemanta showed ability to disambiguate NE in the LOD cloud Experiments across categories of articles did not show significant differences in the analysis. Published the WEKEX'11 ground-truth http://nerd.eurecom.fr/ui/evaluation/wekex2011-goldenset.tar.gz 24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 16/21
  • 17. Future Work (NERD Timeline) beginning core application uncontrolled experiment controlled experiment today REST API, release WEKEX'11 ground-truth release ISWC'11 ground truth NERD “smart” service: combining the best of all NER tools 24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 17/21
  • 18. ISWC'11 golden-set Do you believe it's easy to find an agreement among all raters? We'd like inviting to create a new golden-set during the ISWC'2011 poster and demo session. We will kindly ask each rater to evaluate two short parts of two English news articles with all extractors supported by NERD 24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 18/21
  • 19. Thanks for your time and your attention http://nerd.eurecom.fr @giusepperizzo @rtroncy #nerd http://www.slideshare.net/giusepperizzo 24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 19/21
  • 20. Fleiss ' Kappa chance agreement K = 1 fully agreement among all raters K = 0 (or lesser than) poor agreement 24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 20/21
  • 21. Fleiss ' kappa Interpretation Kappa Interpretation <0 Poor agreement 0.01 – 0.20 Slight agreement 0.21 – 0.40 Fair agreement 0.41 – 0.60 Moderate agreement 0.61 – 0.80 Substantial agreement 0.81 – 1.00 Almost perfect agreement 24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 21/21