(Presentation Chris) Crowdsourcing & Semantic Web: Dagstuhl 2014


Published on

How to Measure Quality with Disagreement?
or the Three Sides of CrowdTruth

Published in: Technology, Business
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

(Presentation Chris) Crowdsourcing & Semantic Web: Dagstuhl 2014

  1. 1. How to Measure Quality with Disagreement? or the Three Sides of CrowdTruth Lora Aroyo & Chris Welty
  2. 2. CrowdTruth Annotator disagreement is signal, not noise. It is indicative of the variation in human semantic interpretation of signs It can indicate ambiguity, vagueness, similarity, over-generality, etc, as well as quality
  3. 3. CrowdTruth Dependencies worker metrics for detecting spam à quality of sentences à quality of the target semantics worker quality metrics can improve significantly when the quality of these other aspects of semantic interpretation are considered
  4. 4. The Three Sides of CrowdTruth
  5. 5. Representation Worker Vector 1 1 1
  6. 6. Representation Sentence Vector 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 0 4 3 0 0 5 1 0
  7. 7. Feeling the way the CHEST expands (PALPATION), can identify areas of the lung that are full of fluid. ?PALPATIONIs CHEST related to diagnose location associated with is_a otherpart_of 0 0 02 3 0 0 0 1 0 0 44 1 Disagreement for Sentence Clarity Unclear relationship between the two arguments reflected in the disagreement
  8. 8. ?CONJUNCTIVITISHYPERAEMIA related toIs 0 0 0 1 0 0 0 013 0 0 0 0 0 symptomcause Redness (HYPERAEMIA), irritation (chemosis) and watering (epiphora) of the eyes are symptoms common to all forms of CONJUNCTIVITIS. Disagreement for Sentence Clarity Clearly expressed relation between the two arguments reflected in the agreement
  9. 9. Sentence-Relation Score Measures how clearly a sentence expresses a relation 0 1 1 0 0 4 3 0 0 5 1 0 Unit vector for relation R6 Sentence Vector Cosine = .55
  10. 10. Worker Disagreement Measured per worker Worker-sentence disagreement 0 1 1 0 0 4 3 0 0 5 1 0 Worker’s sentence vector Sentence Vector AVG (Cosine)
  11. 11. Worker Metrics how much A WORKER disagrees with THE CROWD per sentence à the avg of all cosine distances between each worker’s sentence vector & the full sentence vector (minus that worker) are there consistently like-minded workers à pairwise metric - avg for a particular worker à there may be communities of thought that consistently disagree with others, but agree within themselves Low quality workers generally have high scores in both avg relations per sentence à per worker the number of relations he/she chooses per sentence averaged over all sentences he/she annotates. High score here can help indicate low quality workers.
  12. 12. Sentence Metrics Sentence-relation score à core CrowdTruth metric for relation extraction à measured for each relation on each sentence as the cosine of the unit vector for the relation with the sentence vector indicating that a relation is clearly or vaguely expressed, Sentence clarity à defined for each sentence as the max relation score for that sentence indicating a clear or ambiguous or confusing sentence
  13. 13. Relation Metrics Relation similarity à the causal power (pairwise conditional probability). high similarity score indicates the relations are confusable to workers Relation ambiguity is defined for each relation as the max relation similarity for the relation. If a relation is clear, then it will have a low score. Relation clarity à defined for each relation as the max sentence-relation score for the relation over all sentences. If a relation has a high clarity score, it means that it is at least possible to express the relation clearly Relation frequency is the number of times the relation is annotated at least once in a sentence
  14. 14. Impact of Dependencies
  15. 15. Impact of Dependencies
  16. 16. Impact of Sentence Quality on Worker Quality (a) the space with no filtering of sentences or relations, a single line cannot separate the spammers from non-spammers (b) the space after sentence filtering, Figure (c) after relation filtering, and Figure (d) after both sentence and relation filtering. Sentence filtering makes the classes linearly separable, and the separation between the classes increases in the subsequent figures.
  17. 17. Impact of Relation Quality on Worker Quality (a) the space with no filtering of sentences or relations, a single line cannot separate the spammers from non- spammers (c) after relation filtering the relation filtering much more clearly defines the space, with a large separation between positive and negative instances. the pairwise improvements to the worker scores are significant with p < :001, which is better than the sentence clarity improvements
  18. 18. Combining Sentence & Relation Filtering •  first filtering out low clarity sentences •  then filtering vague and ambiguous relations •  worker metrics were computed on these new sentences and vectors •  proves to even further separate the space, and the pairwise improvement in worker scores from the baseline (unfiltered) is significant with p < :0005. •  The improvement over sentence filtering alone is also significant (p < :01) •  The improvement over relation filtering alone is only significant with p < :05.
  19. 19. quality measures in semantic interpretation tasks are inter-dependent higher accuracy can be achieved by considering the impact of sentence quality & relation quality on worker quality measurements significant improvement in worker quality metrics with respect to known spammers by incorporating the quality of the individual sentences & target relations relationships between the different corners of the triangle of reference, e.g. à the impact of relation & worker quality on sentence measures, à the impact of worker & sentence quality on relation measures
  20. 20. crowdtruth.org