Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Boosting Named Entity Extraction through Crowdsourcing

824 views

Published on

This talk presents preliminary results on improving the NER performance by combining and harnessing diversity among machines and crowd.

Published in: Data & Analytics
  • Be the first to comment

Boosting Named Entity Extraction through Crowdsourcing

  1. 1. Vrije Universiteit Amsterdam Boosting Named Entity Extraction through Crowdsourcing what goes wrong with IE tools? what can we learn from the crowd? Oana Inel 5th December 2016 1
  2. 2. Vrije Universiteit Amsterdam ● work best on limited (predefined) entity types (e.g., people, places, organizations, and to some extend time) ● are all trained on different data ○ perform well only on particular type of data/entities ● their performance is highly dependent on ○ the type of input text ○ the choice of gold standards ■ gold standards are not perfect ■ large amount of training & evaluation data is needed ● similar performance, but different entities coverage ○ different confidence scores ○ different way (non-transparent) of computing it Named Entity Recognition: Observations 2
  3. 3. Vrije Universiteit Amsterdam Problem: - difficult to understand the reliability of the different NER tools - difficult to choose “the best one” for your case Solution: - Combined use, e.g. NERD - However, it also has problems - On the spot reliability on other NER - Limited number of types identified - An alternative to NERD IE Tools Issues 3
  4. 4. Vrije Universiteit Amsterdam 1. Choose multiple SOTA NER tools 2. Combine (aggregate) their output 3. Identify cases where the NER tools underperform 4. Correct and improve NER tools output through crowdsourcing improved ground truth Combining Machines & Crowd for NER 4
  5. 5. Vrije Universiteit Amsterdam Usecase 5 5 NER tools ● NERD-ML ● TextRazor ● SemiTags ● THD ● DBpediaSpotlight Comparative analysis on: - their individual performance (output) - their combined performance (output) Using two existing gold standard datasets: - Open Knowledge Extraction (OKE) Challenge 2015 & 2016
  6. 6. Vrije Universiteit Amsterdam ● OKE challenge (ESWC) 2015 ○ 101 sentences ○ 664 entities ■ Person: 304 ■ Place: 120 ■ Organization: 139 ■ Role: 103 ○ https://github.com/anuzzolese/ oke-challenge OKE 2015 & 2016 Datasets 6 ● OKE challenge (ESWC) 2016 ○ 55 sentences ○ 340 entities ■ Person: 105 ■ Place: 44 ■ Organization: 105 ■ Role: 86 ○ https://github.com/anuzzolese/ oke-challenge-2016
  7. 7. Vrije Universiteit Amsterdam NER Performance: entity surface 7 ● High disagreement between the NER tools ○ Similar performance in F1, but different #FP, #TP, #FN ● Low recall, many entities missed
  8. 8. Vrije Universiteit Amsterdam NER Performance: entity surface 8 ● High disagreement between the NER tools ○ Similar performance in F1, but different #FP, #TP, #FN ● NERD seems to perform the best on F1
  9. 9. Vrije Universiteit Amsterdam NER Performance: entity surface 9 ● High disagreement between the NER tools ○ Similar performance in F1, but different #FP, #TP, #FN ● NERD seems to perform the best on F1 ● CombinedNER significantly higher #TP & lower #FN
  10. 10. Vrije Universiteit Amsterdam NER Performance: entity surface 10 ● High disagreement between the NER tools ○ Similar performance in F1, but different #FP, #TP, #FN ● NERD seems to perform the best on F1 ● CombinedNER significantly higher #TP & lower #FN ● CombinedNER significantly higher #FP
  11. 11. Vrije Universiteit Amsterdam CombinedNER vs. SOTA NER 11 The more the merrier? Is performance correlated to the number of NER tools that extracted a given named entity? ● Performance comparison: ○ Applied CrowdTruth metrics on CombinedNER ○ Likelihood of an entity to be contained in the gold standard based on how many NER tools extracted it Sentence-entity score = ratio of NER that extracted the entity
  12. 12. Vrije Universiteit Amsterdam CombinedNER vs. SOTA NER 12 CombinedNER outperforms the-state-of-the-art NER tools at a sentence-entity score >= 0.4, which is also better than considering the majority vote approach
  13. 13. Vrije Universiteit Amsterdam Where do NER tools fail and why? 13
  14. 14. Vrije Universiteit Amsterdam NER Performance: entity surface & type 14 ● many instances of “people” were missed
  15. 15. Vrije Universiteit Amsterdam Deeper look in Ground Truth: People 15 ● Personal pronouns (co-references) and possessive pronouns are considered named entities of type “person” ○ 83/85 cases (in OKE2015) ○ 26/27 cases (in OKE2016) Giulio Natta was born in Imperia, Italy. [He] earned [his] degree in chemical engineering from the Politecnico di Milano university in Milan in 1924. ● There are also errors in the ground truth ○ 1 case in OKE2015 [One of the them] was an eminent scholar at Berkeley.
  16. 16. Vrije Universiteit Amsterdam NER Performance: entity surface & type 16 ● many instances of “people” were missed ● only few “places” were missed in 2015, and none in 2016
  17. 17. Vrije Universiteit Amsterdam Deeper look in Ground Truth: Places 17 ● There are concatenation of multiple entities of type place, e.g: City, Country ● 4/4 cases in OKE2015 Such a man did post-doctoral work at the Salk Institute in San Diego in the laboratory of Renato Dulbecco, then worked at the Basel Institute for Immunology in [Basel, Switzerland]. but, the offsets given in the GT, do not match with the actual string ● Inconsistencies across datasets ○ In 2016, such entities were actually classified as two entities of type place
  18. 18. Vrije Universiteit Amsterdam NER Performance: entity surface & type 18 ● many instances of “people” were missed ● only few “places” were missed in 2015, and none in 2016 ● many FP for entities of type “organization”
  19. 19. Vrije Universiteit Amsterdam Deeper look in Ground Truth: Organization 19 ● Many entities of type “organization” are a combination of “organization” + “place” ○ NER tools tend to extract each entity granularity ○ GT does not allow for overlapping entities or multiple perspectives ■ 105/213 cases in OKE2015 ■ 62/157 cases in OKE2016 Such a man did post-doctoral work at the Salk Institute in San Diego in the laboratory of Renato Dulbecco, then worked at the [[[Basel] [Institute]] for Immunology] in Basel, Switzerland.
  20. 20. Vrije Universiteit Amsterdam NER Performance: entity surface & type 20 ● many instances of “people” were missed ● only few “places” were missed in 2015, and none in 2016 ● many FP for entities of type “organization” ● several FP for entities of type “people” and “role”
  21. 21. Vrije Universiteit Amsterdam Deeper look in Ground Truth: People & Role 21 ● Multiple span variations for the same entity type “person” ○ 73/92 cases in OKE2015 ○ 9/13 cases in OKE2016 The woman was awarded the Nobel Prize in Physics in 1963, which she shared with [[J.] [[Hans] D.] [Jensen]] and Eugene Wigner. ● Inconsistencies & ambiguous combinations of type “role” and “person” ○ Bishop Petronius → person ○ But, Queen Elizabeth II was not typed “person” ■ Queen → role ■ Elizabeth II → person ● Many combinations of “person” and “role”, especially when the “person” is an ethnic group (e.g., French author, Canadian citizen) ○ 9/92 cases in OKE2015 ○ 2/13 cases in OKE2016
  22. 22. Vrije Universiteit Amsterdam Crowdsourcing for better Ground Truth 22 ● Crowd-driven Ground Truth Case 1: Crowd reduces the number of #FP ● For each entity that has multiple variations (span alternative) we create an entity cluster Case 2: Crowd reduces the number of #FN ● For each entity that was not extracted, we create a cluster with partial overlaps but also every other combination of words contained in the overlap ● Goal: ○ identify all the valid expressions and their types ○ decrease the number of FP and the number of FN
  23. 23. Vrije Universiteit Amsterdam Crowdsourcing task - template 23
  24. 24. Vrije Universiteit Amsterdam Results : CombinedNER vs. CombinedNER+Crowd 24 CombinedNER+Crowd outperforms CombinedNER for each crowd-entity score crowd-entity score - likelihood of an entity to be a valid entity in the dataset (based on CrowdTruth)
  25. 25. Vrije Universiteit Amsterdam Conclusions ● difficult to find one NER tool that performs well ● combining the output of several NER tools results in disagreement But, ● using crowdsourcing to correct and improve their out results in a better outcome ● furthermore, the crowd can help us in identifying problems of the GT 25

×