• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
(Presentation Chris) Crowdsourcing & Semantic Web: Dagstuhl 2014
 

(Presentation Chris) Crowdsourcing & Semantic Web: Dagstuhl 2014

on

  • 350 views

How to Measure Quality with Disagreement?

How to Measure Quality with Disagreement?
or the Three Sides of CrowdTruth

Statistics

Views

Total Views
350
Views on SlideShare
307
Embed Views
43

Actions

Likes
2
Downloads
4
Comments
0

2 Embeds 43

https://twitter.com 42
https://tweetdeck.twitter.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    (Presentation Chris) Crowdsourcing & Semantic Web: Dagstuhl 2014 (Presentation Chris) Crowdsourcing & Semantic Web: Dagstuhl 2014 Presentation Transcript

    • How to Measure Quality with Disagreement? or the Three Sides of CrowdTruth Lora Aroyo & Chris Welty
    • CrowdTruth Annotator disagreement is signal, not noise. It is indicative of the variation in human semantic interpretation of signs It can indicate ambiguity, vagueness, similarity, over-generality, etc, as well as quality
    • CrowdTruth Dependencies worker metrics for detecting spam à quality of sentences à quality of the target semantics worker quality metrics can improve significantly when the quality of these other aspects of semantic interpretation are considered
    • The Three Sides of CrowdTruth
    • Representation Worker Vector 1 1 1
    • Representation Sentence Vector 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 0 4 3 0 0 5 1 0
    • Feeling the way the CHEST expands (PALPATION), can identify areas of the lung that are full of fluid. ?PALPATIONIs CHEST related to diagnose location associated with is_a otherpart_of 0 0 02 3 0 0 0 1 0 0 44 1 Disagreement for Sentence Clarity Unclear relationship between the two arguments reflected in the disagreement
    • ?CONJUNCTIVITISHYPERAEMIA related toIs 0 0 0 1 0 0 0 013 0 0 0 0 0 symptomcause Redness (HYPERAEMIA), irritation (chemosis) and watering (epiphora) of the eyes are symptoms common to all forms of CONJUNCTIVITIS. Disagreement for Sentence Clarity Clearly expressed relation between the two arguments reflected in the agreement
    • Sentence-Relation Score Measures how clearly a sentence expresses a relation 0 1 1 0 0 4 3 0 0 5 1 0 Unit vector for relation R6 Sentence Vector Cosine = .55
    • Worker Disagreement Measured per worker Worker-sentence disagreement 0 1 1 0 0 4 3 0 0 5 1 0 Worker’s sentence vector Sentence Vector AVG (Cosine)
    • Worker Metrics how much A WORKER disagrees with THE CROWD per sentence à the avg of all cosine distances between each worker’s sentence vector & the full sentence vector (minus that worker) are there consistently like-minded workers à pairwise metric - avg for a particular worker à there may be communities of thought that consistently disagree with others, but agree within themselves Low quality workers generally have high scores in both avg relations per sentence à per worker the number of relations he/she chooses per sentence averaged over all sentences he/she annotates. High score here can help indicate low quality workers.
    • Sentence Metrics Sentence-relation score à core CrowdTruth metric for relation extraction à measured for each relation on each sentence as the cosine of the unit vector for the relation with the sentence vector indicating that a relation is clearly or vaguely expressed, Sentence clarity à defined for each sentence as the max relation score for that sentence indicating a clear or ambiguous or confusing sentence
    • Relation Metrics Relation similarity à the causal power (pairwise conditional probability). high similarity score indicates the relations are confusable to workers Relation ambiguity is defined for each relation as the max relation similarity for the relation. If a relation is clear, then it will have a low score. Relation clarity à defined for each relation as the max sentence-relation score for the relation over all sentences. If a relation has a high clarity score, it means that it is at least possible to express the relation clearly Relation frequency is the number of times the relation is annotated at least once in a sentence
    • Impact of Dependencies
    • Impact of Dependencies
    • Impact of Sentence Quality on Worker Quality (a) the space with no filtering of sentences or relations, a single line cannot separate the spammers from non-spammers (b) the space after sentence filtering, Figure (c) after relation filtering, and Figure (d) after both sentence and relation filtering. Sentence filtering makes the classes linearly separable, and the separation between the classes increases in the subsequent figures.
    • Impact of Relation Quality on Worker Quality (a) the space with no filtering of sentences or relations, a single line cannot separate the spammers from non- spammers (c) after relation filtering the relation filtering much more clearly defines the space, with a large separation between positive and negative instances. the pairwise improvements to the worker scores are significant with p < :001, which is better than the sentence clarity improvements
    • Combining Sentence & Relation Filtering •  first filtering out low clarity sentences •  then filtering vague and ambiguous relations •  worker metrics were computed on these new sentences and vectors •  proves to even further separate the space, and the pairwise improvement in worker scores from the baseline (unfiltered) is significant with p < :0005. •  The improvement over sentence filtering alone is also significant (p < :01) •  The improvement over relation filtering alone is only significant with p < :05.
    • quality measures in semantic interpretation tasks are inter-dependent higher accuracy can be achieved by considering the impact of sentence quality & relation quality on worker quality measurements significant improvement in worker quality metrics with respect to known spammers by incorporating the quality of the individual sentences & target relations relationships between the different corners of the triangle of reference, e.g. à the impact of relation & worker quality on sentence measures, à the impact of worker & sentence quality on relation measures
    • crowdtruth.org