NERD: Evaluating Named EntityRecognition Tools in the Web of Data     Giuseppe Rizzo <giuseppe.rizzo@eurecom.fr>     Rapha...
What is a Named Entity recognition task?A task that aims to locate and classify the name of a person or anorganization, a ...
Named Entity recognition tools24 October 2011   Workshop on Web Scale Knowledge Extraction (WEKEX11)   - 3/21
Differences among those NER extractors        Granularity                        extract NE from sentences vs from the e...
And ...                                      What about precision and recall?                                      Which...
Seeks to find pros and cons of                   those extractors                                        What is NERD?    ...
Showcase                  http://nerd.eurecom.fr       Science: "Google Cars Drive Themselves", http://bit.ly/oTj8md (part...
Evaluation 5 extractors using default configurations      Controlled experiment                          4 human raters ...
Evaluation output                           t = (NE, type, URI, relevant)  The assessment consists in rating these criteri...
Controlled experiment - dataset1     Categories: World, Business, Sport, Science, Health     1 BBC article and 1 NYT artic...
Controlled experiment – agreement score  Fleisss kappa score1  Grouped by  extractor  Grouped by  source  Grouped by  cate...
Controlled experiment – statistic result  Overall  statistics  Grouped by  extractor                                      ...
Uncontrolled experiment - dataset 17 raters were free to select English news articles from CNN, BBC, The New York Times an...
Uncontrolled experiment – statistic result (I)OverallprecisionGrouped byextractors24 October 2011   Workshop on Web Scale ...
Uncontrolled experiment – statistic result (II)    Grouped by    category24 October 2011   Workshop on Web Scale Knowledge...
Q. Which are the best NER tools ?Conclusion                                                           A. They are ...  Alc...
Future Work (NERD Timeline)                  beginning                     core application                               ...
ISWC11 golden-set                                      Do you believe its easy to find                                    ...
Thanks for your time and your attention                      http://nerd.eurecom.fr                           @giusepperiz...
Fleiss  Kappa                         chance agreement     K = 1 fully agreement among all raters     K = 0 (or lesser tha...
Fleiss  kappa Interpretation                            Kappa                                          Interpretation     ...
Upcoming SlideShare
Loading in...5
×

NERD: Evaluating Named Entity Recognition Tools in the Web of Data

1,994

Published on

Talk "NERD: Evaluating Named Entity
Recognition Tools in the Web of Data" event during WEKEX'11 workshop (ISWC'11), Bonn, Germany

Published in: Technology, Education
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,994
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
25
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

NERD: Evaluating Named Entity Recognition Tools in the Web of Data

  1. 1. NERD: Evaluating Named EntityRecognition Tools in the Web of Data Giuseppe Rizzo <giuseppe.rizzo@eurecom.fr> Raphaël Troncy <raphael.troncy@eurecom.fr>
  2. 2. What is a Named Entity recognition task?A task that aims to locate and classify the name of a person or anorganization, a location, a brand, a product, a numeric expressionincluding time, date, money and percent in a textual document24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX11) - 2/21
  3. 3. Named Entity recognition tools24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX11) - 3/21
  4. 4. Differences among those NER extractors  Granularity  extract NE from sentences vs from the entire document  Technologies used  algorithms used to extract NE  supported languages  taxonomy of type of NE recognized  disambiguation (dataset used to provide links)  content request size  Response format24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX11) - 4/21
  5. 5. And ...  What about precision and recall?  Which extractor best fits my needs?24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX11) - 5/21
  6. 6. Seeks to find pros and cons of those extractors What is NERD? REST API1 ontology3 UI21 http://nerd.eurecom.fr/api/application.wadl2 http://nerd.eurecom.fr/3 http://nerd.eurecom.fr/ontology 24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX11) - 6/21
  7. 7. Showcase http://nerd.eurecom.fr Science: "Google Cars Drive Themselves", http://bit.ly/oTj8md (part of the original resource found at http://nyti.ms/9p19i8)24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX11) - 7/21
  8. 8. Evaluation 5 extractors using default configurations  Controlled experiment  4 human raters  10 English news articles (5 from BBC and 5 from The New York Times)  each rater evaluated each article for all the extractors  200 evaluations in total  Uncontrolled experiment  17 human raters  53 English news articles (sources: CNN, BBC, The New York Times and Yahoo! News)  free selection of articles Each human rater received a training1 1 http://nerd.eurecom.fr/help24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX11) - 8/21
  9. 9. Evaluation output t = (NE, type, URI, relevant) The assessment consists in rating these criteria with a Boolean value If no type or no disambiguation URI is provided by the extractor, it is considered false by default24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX11) - 9/21
  10. 10. Controlled experiment - dataset1 Categories: World, Business, Sport, Science, Health 1 BBC article and 1 NYT article for each category Average word number per article: 981 The final number of unique entities detected is 4641 with an average number of named entity per article equal to 23.2 Some of the extractors (e.g. DBpedia Spotlight and Extractiv) provide NE duplicates. We removed all duplicates do not bias the statistics1 http://nerd.eurecom.fr/ui/evaluation/wekex2011-goldenset.tar.gz 24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX11) - 10/21
  11. 11. Controlled experiment – agreement score Fleisss kappa score1 Grouped by extractor Grouped by source Grouped by category 1 Joseph L. Fleiss. Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5):378–382, 197124 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX11) - 11/21
  12. 12. Controlled experiment – statistic result Overall statistics Grouped by extractor different behavior for different sources Grouped by category24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX11) - 12/21
  13. 13. Uncontrolled experiment - dataset 17 raters were free to select English news articles from CNN, BBC, The New York Times and Yahoo! News 53 news articles selected Total number of assessments = 94 and the assessment average number per user = 5.2 Each article assessed at least by 2 different tools The final number of unique entities detected is 1616 with an average number of named entity per article equal to 34 Some of the extractors (e.g. DBpedia Spotlight and Extractiv) provide NE duplicates. In order do not bias the statistics, we removed all duplicates24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX11) - 13/21
  14. 14. Uncontrolled experiment – statistic result (I)OverallprecisionGrouped byextractors24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX11) - 14/21
  15. 15. Uncontrolled experiment – statistic result (II) Grouped by category24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX11) - 15/21
  16. 16. Q. Which are the best NER tools ?Conclusion A. They are ... AlchemyAPI has obtained the best results in NE extraction and categorization DBpedia Spotlight and Zemanta showed ability to disambiguate NE in the LOD cloud Experiments across categories of articles did not show significant differences in the analysis. Published the WEKEX11 ground-truth http://nerd.eurecom.fr/ui/evaluation/wekex2011-goldenset.tar.gz24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX11) - 16/21
  17. 17. Future Work (NERD Timeline) beginning core application uncontrolled experiment controlled experiment today REST API, release WEKEX11 ground-truth release ISWC11 ground truth NERD “smart” service: combining the best of all NER tools24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX11) - 17/21
  18. 18. ISWC11 golden-set Do you believe its easy to find an agreement among all raters? Wed like inviting to create a new golden-set during the ISWC2011 poster and demo session. We will kindly ask each rater to evaluate two short parts of two English news articles with all extractors supported by NERD24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX11) - 18/21
  19. 19. Thanks for your time and your attention http://nerd.eurecom.fr @giusepperizzo @rtroncy #nerd http://www.slideshare.net/giusepperizzo24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX11) - 19/21
  20. 20. Fleiss Kappa chance agreement K = 1 fully agreement among all raters K = 0 (or lesser than) poor agreement24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX11) - 20/21
  21. 21. Fleiss kappa Interpretation Kappa Interpretation <0 Poor agreement 0.01 – 0.20 Slight agreement 0.21 – 0.40 Fair agreement 0.41 – 0.60 Moderate agreement 0.61 – 0.80 Substantial agreement 0.81 – 1.00 Almost perfect agreement24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX11) - 21/21
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×