Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Harnessing diversity in crowds and machines for better ner performance

2,702 views

Published on

Over the last years, information extraction tools have gained a great popularity and brought significant performance improvement in extracting meaning from structured or unstructured data. For example, named entity recognition (NER) tools identify types such as people, organizations or places in text. However, despite their high F1 performance, NER tools are still prone to brittleness due to their highly specialized and constrained input and training data. Thus, each tool is able to extract only a subset of the named entities (NE) mentioned in a given text. In order to improve \emph{NE Coverage}, we propose a hybrid approach, where we first aggregate the output of various NER tools and then validate and extend it through crowdsourcing. The results from our experiments show that this approach performs significantly better than the individual state-of-the-art tools (including existing tools that integrate individual outputs already). Furthermore, we show that the crowd is quite effective in (1) identifying mistakes, inconsistencies and ambiguities in currently used ground truth, as well as in (2) a promising approach to gather ground truth annotations for NER that capture a multitude of opinions.

Published in: Data & Analytics
  • Be the first to comment

Harnessing diversity in crowds and machines for better ner performance

  1. 1. Web & Media Group http://CrowdTruth.org Harnessing Diversity in Crowds and Machines for Better NER Performance Oana Inel and Lora Aroyo Extended Semantic Web Conference May 30th 2017
  2. 2. Web & Media Group Ground Truth Datasets: OKE2015: Open Knowledge Extraction challenge (ESWC) 2015 • 101 sentences • 664 entities OKE2016: Open Knowledge Extraction challenge (ESWC) 2016 • 55 sentences • 340 entities Named Entities of Type: Person, Place, Organization & Role https://github.com/anuzzolese/oke-challenge https://github.com/anuzzolese/oke-challenge-2016 What’s wrong with the ground truth for NER?
  3. 3. Web & Media Group http://CrowdTruth.org What’s wrong with the ground truth for NER? • Inconsistencies in annotating entities of type ROLE and PERSON … and similar for • But, “Bishop Petronius” is a PERSON ... controversial [bishop] [Richard Williamson]. ● [bishop] → role ● [Richard Williamson] → person … appointed Knight Bachelor by the [Queen] [Elizabeth II] in 1988. Bologna was reborn in the 5th century under [Bishop Petronius].
  4. 4. Web & Media Group http://CrowdTruth.org What’s wrong with the ground truth for NER? • Inconsistencies across datasets • 4 cases in OKE2015 where concatenation of entities of type PLACE were treated as a single entity … but, the offsets, do not match with the actual string • In OKE2016, such entities were classified as two entities of type PLACE Such a man did post-doctoral work at the Salk Institute in San Diego in the laboratory of Renato Dulbecco, then worked at the Basel Institute for Immunology in [Basel, Switzerland].
  5. 5. Web & Media Group http://CrowdTruth.org What’s wrong with the ground truth for NER? Giulio Natta was born in Imperia, Italy. [He] earned [his] degree in chemical engineering from the Politecnico di Milano university in Milan in 1924. [One of the them] was an eminent scholar at Berkeley. • There are errors in the ground truth • 1 case in OKE2015 • Personal pronouns (co-references) and possessive pronouns are considered named entities of type PERSON • 85/304 cases (OKE2015) • 27/105 cases (OKE2016) What’s wrong with the ground truth for NER?
  6. 6. Web & Media Group http://CrowdTruth.org How reliable are the NER tools trained using this ground truth? … but, then ...
  7. 7. Web & Media Group How do state-of-the-art NER tools perform on this ground truth? Performance of NER Tools: • NERD-ML • TextRazor • THD • DBpediaSpotlight • SemiTags http://nerd.eurecom.fr 
 https://www.textrazor.com 
 http://ner.vse.cz/thd/ 
 http://dbpedia-spotlight.github.io/demo/ 
 http://nlp.vse.cz/SemiTags/ 

  8. 8. Web & Media Group http://CrowdTruth.org • do not extract personal pronouns (co-references) and possessive pronouns as type PERSON entities • 83/85 missed cases in OKE2015 • 26/27 missed cases in OKE2016 • extract multiple span variations for the same entity type PERSON • 73/92 cases in OKE2015 • 9/13 cases in OKE2016 … let’s see how NER tools perform on this GT Giulio Natta was born in Imperia, Italy. [He] earned [his] degree in chemical engineering from the Politecnico di Milano university in Milan in 1924. The woman was awarded the Nobel Prize in Physics in 1963, which she shared with [[J.] [[Hans] D.] [Jensen]] and Eugene Wigner.
  9. 9. Web & Media Group http://CrowdTruth.org ● Many entities of type ORGANIZATION are a combination of ORGANIZATION + PLACE • NER tools tend to extract each entity granularity • GT does not allow for overlapping entities or multiple perspectives • 105/213 cases in OKE2015 • 62/157 cases in OKE2016 … let’s see how NER tools perform on this GT Such a man did post-doctoral work at the Salk Institute in San Diego in the laboratory of Renato Dulbecco, then worked at the [[[Basel] [Institute]] for Immunology] in Basel, Switzerland.
  10. 10. Web & Media Group http://CrowdTruth.org … and now let’s see their overall performance ● High disagreement between the NER tools ○ Similar performance in F1, but different #FP, #TP, #FN
  11. 11. Web & Media Group http://CrowdTruth.org … and now let’s see their overall performance ● High disagreement between the NER tools ○ Similar performance in F1, but different #FP, #TP, #FN ● Low recall, many entities missed
  12. 12. Web & Media Group http://CrowdTruth.org difficult to understand the reliability of different NER tools it’s difficult to choose the “best one” for your case
  13. 13. Web & Media Group Hypothesis Aggregating the output of NER tools by harnessing the inter-tool disagreement (Multi-NER) performs better than the individual NERs (Single-NER)
  14. 14. Web & Media Group http://CrowdTruth.org Multi-NER vs. SOTA Single-NER
  15. 15. Web & Media Group http://CrowdTruth.org • Multi-NER significantly higher #TP & lower #FN Multi-NER vs. SOTA Single-NER
  16. 16. Web & Media Group http://CrowdTruth.org • Multi-NER significantly higher #TP & lower #FN • Multi-NER significantly higher #FP Multi-NER vs. SOTA Single-NER
  17. 17. Web & Media Group http://CrowdTruth.org17 The more the merrier? Is performance correlated with the number of NER tools that extracted a given named entity? ● Performance comparison: • Likelihood of an entity to be contained in the gold standard based on how many NER tools extracted it Sentence-entity score = ratio of NER that extracted the entity Multi-NER vs. SOTA Single-NER
  18. 18. Web & Media Group http://CrowdTruth.org18 Disagreement Aware Multi-NER vs. Single-NER Multi-NER outperforms SOTA Single-NER tools at a sentence-entity score threshold >= 0.4
  19. 19. Web & Media Group http://CrowdTruth.org19 Disagreement Aware Multi-NER vs. Single-NER 19 Multi-NER outperforms the majority vote approach (sentence-entity score threshold = 0.6) at a sentence-entity score threshold >= 0.4
  20. 20. Web & Media Group Hypothesis Crowdsourced ground truth by harnessing inter-annotator disagreement (Multi-NER+CrowdGT) produces diversity in annotations and thus, improves the aggregated output of NER tools
  21. 21. Web & Media Group http://CrowdTruth.org21 • Identify all the valid expressions and their types • Reduce the number of FP and FN cases Crowdsourcing for Better Ground Truth
  22. 22. Web & Media Group http://CrowdTruth.org22 Crowdsourcing Task
  23. 23. Web & Media Group http://CrowdTruth.org Crowdsourcing for Better NER For each crowd sentence-entity score the crowd enhanced Multi-NER (Multi-NER+Crowd) performs much better than the Multi-NER
  24. 24. Web & Media Group http://CrowdTruth.org Crowdsourcing for Better Ground Truth Multi-NER+CrowdGT, the enhanced Multi-NER through crowd-driven ground truth gathering performs better than Multi-NER+Crowd.
  25. 25. Web & Media Group http://CrowdTruth.org Conclusions & Future Work ● a hybrid Multi-NER crowd-driven approach for improved NER performance ● the crowd is able to correct the mistakes of the NER tools by reducing the total number of false positive cases ● the crowd-driven ground truth, that harnesses diversity, perspectives and granularities, proves to be more reliable ● improving the event detection ground truth (FactBank) through disagreement-aware crowdsourcing ● train an event classifier with crowd-driven events Data http://data.crowdtruth.org http://data.crowdtruth.org/crowdsourcing-ne-goldstandards/ Tool & Code http://dev.CrowdTruth.org http://github.com/CrowdTruth

×