Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Truth is a Lie: Rules & Semantics from Crowd Perspectives (RR'2015 Keynote)

3,316 views

Published on

http://crowdtruth.org

Processing real-world data with the crowd leaves one thing absolutely clear - there is no single notion of truth, but rather a spectrum that has to account for context, opinions, perspectives and shades of grey. CrowdTruth is a new framework for processing of human semantics drawn more from the notion of consensus then from set theory.

Published in: Technology
  • Be the first to comment

Truth is a Lie: Rules & Semantics from Crowd Perspectives (RR'2015 Keynote)

  1. 1. Web & Media Group http://lora-aroyo.org @laroyo Truth is a Lie Rules & Semantics from Crowd Perspectives
  2. 2. Web & Media Group http://lora-aroyo.org @laroyo “in embracing the diversity of human beings, we will find a surer way to true happiness” Malcolm Gladwell, 2006 in his TED talk “Choice, happiness and spaghetti sauce”
  3. 3. Web & Media Group http://lora-aroyo.org @laroyo “If we treat human brains as processors in a distributed system, each can perform a small part of a massive computation” Luis von Ahn, 2006 “Games with a Purpose”, Computer Magazine
  4. 4. Web & Media Group http://lora-aroyo.org @laroyo “The challenge for Watson was to answer questions posed in every nuance of natural language. The first step in accomplishing this was to have a lot of questions and their correct answers. Any cognitive system must take this first step.” Chris Welty, 2014 “Cognitive Computing and the Future”, 1st Cognitive Computing Forum
  5. 5. Web & Media Group http://lora-aroyo.org @laroyo Take Home Message True  and  False  is  not  enough,     There  is  diversity  in  human  interpreta3on     CrowdTruth  introduces  a  spa3al  representa3on  of   meaning  that  through  crowdsourcing     harnesses  disagreement     Using  CrowdTruth   untrained  workers  can  be  just  as  reliable  as     highly  trained  experts  
  6. 6. Web & Media Group http://lora-aroyo.org @laroyo Is this song …. Passionate Rousing Confident Boisterous Rowdy Literate Poignant Wistful Bittersweet Autumnal Brooding Rollicking Cheerful Fun Sweet Amiable Good-natured Humorous Silly Campy Whimsical Witty Wry Aggressive Fiery Tense Anxious Intense Volatile
  7. 7. Web & Media Group http://lora-aroyo.org @laroyo ?Passionate Rousing Confident Boisterous Rowdy Literate Poignant Wistful Bittersweet Autumnal Brooding Rollicking Cheerful Fun Sweet Amiable Good-natured Humorous Silly Campy Whimsical Witty Wry Aggressive Fiery Tense Anxious Intense Volatile Is this song ….
  8. 8. Web & Media Group http://lora-aroyo.org @laroyo ?- category(“That’s not my name”, x) ?Passionate Rousing Confident Boisterous Rowdy Literate Poignant Wistful Bittersweet Autumnal Brooding Rollicking Cheerful Fun Sweet Amiable Good-natured Humorous Silly Campy Whimsical Witty Wry Aggressive Fiery Tense Anxious Intense Volatile (Lee and Hu 2012)
  9. 9. Web & Media Group http://lora-aroyo.org @laroyo Patients with TYPHUS who were given ANTIBIOTICS exhibited several side-effects. With ANTIBIOTICS in short supply, DDT was used during World War II to control the insect vectors of TYPHUS. ANTIBIOTICS are the first line treatment for indications of TYPHUS. Does this sentence express TREATS(Antibiotics, Typhus)?
  10. 10. Web & Media Group http://lora-aroyo.org @laroyo TREATS(Antibiotics, Typhus) Is this True?
  11. 11. Web & Media Group http://lora-aroyo.org @laroyo What is the treatment for typhus? Antibiotic therapy is recommended for both endemic and epidemic typhus infections because early treatment with antibiotics (for example, azithromycin, doxycycline, tetracycline, or chloramphenicol) can cure most people infected with the bacteria. Consultation with an infectious-disease expert is advised especially if epidemic typhus or typhus in pregnant females is diagnosed. Delays in treatment may allow renal, lung, or nervous system problems to develop. Some patients, especially the elderly, may die. TREATS(Antibiotics, Typhus) Is this True?
  12. 12. Web & Media Group http://lora-aroyo.org @laroyo TREATS(Corticosteroids, Lupus) Is this True? What is the treatment for lupus? In more severe cases of Lupus, medications that modulate the immune system (mainly corticosteroids, immunosuppressants) are used to control the disease and prevent recurrence of symptoms (known as flares). Depending on the dosage, people who require steroids may develop Cushing's syndrome. This may subside if and when the large initial dosage is reduced, but long-term use of even low doses can cause elevated blood pressure and cataracts. Numerous new immunosuppressive drugs are being actively tested. Rather than suppressing the immune system nonspecifically, as corticosteroids do, they target the responses of individual [types of] immune cells.
  13. 13. Web & Media Group http://lora-aroyo.org @laroyo Human Interpretation “I will never understand people” “They are the worst”
  14. 14. Web & Media Group http://lora-aroyo.org @laroyo human subjectivity, ambiguity & uncertainty of expression are part of human semantics
  15. 15. Web & Media Group http://lora-aroyo.org @laroyo current practices in computer science (AI) are based on fallacies
  16. 16. Web & Media Group http://lora-aroyo.org @laroyo single truth
  17. 17. Web & Media Group http://lora-aroyo.org @laroyo experts rule
  18. 18. Web & Media Group http://lora-aroyo.org @laroyo disagreement is bad
  19. 19. Web & Media Group http://lora-aroyo.org @laroyo simplification of context this all results in
  20. 20. Web & Media Group http://lora-aroyo.org @laroyo
  21. 21. Web & Media Group http://lora-aroyo.org @laroyo this means our systems understand only simplified black & white world problem ..
  22. 22. Web & Media Group http://lora-aroyo.org @laroyo reality is complex diverse but …
  23. 23. Web & Media Group http://lora-aroyo.org @laroyo current gold standard practices are not adequate any more so...
  24. 24. Web & Media Group http://lora-aroyo.org @laroyo there is a gap we need to bridge and …
  25. 25. Web & Media Group http://lora-aroyo.org @laroyo interestingly …
  26. 26. Web & Media Group http://lora-aroyo.org @laroyo in humanities … diversity is well accepted
  27. 27. Web & Media Group http://lora-aroyo.org @laroyo in sciences … diversity is taboo
  28. 28. Web & Media Group http://lora-aroyo.org @laroyo despite numerous examples … string theory debate
  29. 29. Web & Media Group http://lora-aroyo.org @laroyo global warming debate and …
  30. 30. Web & Media Group http://lora-aroyo.org @laroyo evolutionary bio. debate gradualismpunctuated equilibrium and …
  31. 31. Web & Media Group http://lora-aroyo.org @laroyo What can we do? so …
  32. 32. Web & Media Group http://lora-aroyo.org @laroyo •  collective decisions of large groups of people •  a group of error-prone decision- makers can be surprisingly good at picking the best choice •  when thumbs up or thumbs down - the chance of picking the right answer needs to be > 50% •  the odds that a most of them will pick the right answer is greater than any of them will pick it on their own •  performance gets better as size grows 1785 Marquis de Condorcet “wisdom of crowds”
  33. 33. Web & Media Group http://lora-aroyo.org @laroyo • asked 787 people to guess the weight of an ox • none got the right answer • their collective guess was almost perfect 1906 Sir Francis Galton “wisdom of crowds”
  34. 34. Web & Media Group http://lora-aroyo.org @laroyo WWII Math Rosies 1942: Ballistics calculations and flight trajectories
  35. 35. Web & Media Group http://lora-aroyo.org @laroyo NASA’s Computer Room transcribe raw flight data from celluloid film & oscillograph paper
  36. 36. Web & Media Group http://lora-aroyo.org @laroyo “Who Wants to Be a Millionaire?” •  help from individual expert or audience poll •  majority of the audience right 91% of the time •  individuals right only 65% of the time “wisdom of crowds” 2004 James Surowiecki
  37. 37. Web & Media Group http://lora-aroyo.org @laroyo Crowdsourcing Human Computing
  38. 38. Web & Media Group http://lora-aroyo.org @laroyo the wise crowd diversity of opinion independent perspectives multitude of contexts gives the big picture James Surowiecki
  39. 39. Web & Media Group http://lora-aroyo.org @laroyo current practices in computer science (AI) are based on fallacies
  40. 40. Web & Media Group http://lora-aroyo.org @laroyo today’s real-world complex problem solving needs understanding of multiple perspectives current practices in computer science (AI) are based on fallacies
  41. 41. Web & Media Group http://lora-aroyo.org @laroyo single truth
  42. 42. Web & Media Group http://lora-aroyo.org @laroyo multiple truths single truth
  43. 43. Web & Media Group http://lora-aroyo.org @laroyo experts rule
  44. 44. Web & Media Group http://lora-aroyo.org @laroyo wisdom of crowds experts rule
  45. 45. Web & Media Group http://lora-aroyo.org @laroyo disagreement is bad
  46. 46. Web & Media Group http://lora-aroyo.org @laroyo disagreement is bad disagreement as signal
  47. 47. Web & Media Group http://lora-aroyo.org @laroyo “best collective decisions are result of disagreement, not consensus or compromise” James Surowiecki because …
  48. 48. Web & Media Group http://lora-aroyo.org @laroyo can we harness it?
  49. 49. Web & Media Group http://lora-aroyo.org @laroyo Is this song …. ? Passionate Rousing Confident Boisterous Rowdy Literate Poignant Wistful Bittersweet Autumnal Brooding Rollicking Cheerful Fun Sweet Amiable Good-natured Humorous Silly Campy Whimsical Witty Wry Aggressive Fiery Tense Anxious Intense Volatile
  50. 50. Web & Media Group http://lora-aroyo.org @laroyo If “One Truth” & “No Disagreement” Worker Mood-C1 Mood-C2 Mood-C3 Mood-C4 Mood-C5 W1 1 W2 1 W3 1 W4 1 W5 1 W6 1 W7 W8 W9 1 W10 1 Totals 1 3 1 2 1
  51. 51. Web & Media Group http://lora-aroyo.org @laroyo Worker Mood-C1 Mood-C2 Mood-C3 Mood-C4 Mood-C5 Other W1 1 1 1 W2 1 1 1 W3 1 1 1 W4 1 1 W5 1 1 W6 1 1 1 W7 1 1 1 W8 1 1 1 W9 1 1 W10 1 1 1 1 1 Totals 3 5 6 5 2 8 If “Many Truths” & “Disagreement”
  52. 52. Web & Media Group http://lora-aroyo.org @laroyo can indicate alternative interpretations Worker Mood-C1 Mood-C2 Mood-C3 Mood-C4 Mood-C5 Other W10 1 1 1 1 1 Totals 3 5 6 5 2 8 Disagreement as Signal can indicate ambiguity in the categorisation can indicate low quality workers
  53. 53. Web & Media Group http://lora-aroyo.org @laroyo Does this sentence express TREATS(Antibiotics, Typhus)? Patients with TYPHUS who were given ANTIBIOTICS exhibited several side-effects. With ANTIBIOTICS in short supply, DDT was used during World War II to control the insect vectors of TYPHUS. ANTIBIOTICS are the first line treatment for indications of TYPHUS. 95% 75% 50% Disagreement is signal here too
  54. 54. Web & Media Group http://lora-aroyo.org @laroyo Sentence Treat Prevent Cause Side Effect Diagnose S1 2 S2 2 S3 2 S4 2 S5 1 1 S6 2 S7 2 S8 1 1 S9 2 S10 2 If “One Truth”, “No Disagreement” & “Experts”
  55. 55. Web & Media Group http://lora-aroyo.org @laroyo •  we rejected 3 fallacies: one truth, experts rule and disagreement is bad •  by encouraging disagreement • we exposed a richer set of possibilities • that help in identifying, processing & understanding context so …
  56. 56. Web & Media Group http://lora-aroyo.org @laroyo •  One truth: data collection efforts assume one correct interpretation for every example •  All examples are created equal: ground truth treats all examples the same – either match the correct result or not •  Detailed guidelines help: if examples cause disagreement - add instructions to limit interpretations •  Disagreement is bad: increase quality of annotation data by reducing disagreement among the annotators •  One is enough: most of the annotated examples are evaluated by one person •  Experts are better: annotators with domain knowledge provide better annotations •  Once done, forever valid: annotations are not updated; new data not aligned with old 7 Myths myths directly influence the practice of collecting human annotated data; Need to be revised with a new theory of truth (CrowdTruth) Lora Aroyo, Chris Welty: Truth is a Lie: 7 Myths about Human Annotation, AI Magazine 2014.
  57. 57. Web & Media Group http://lora-aroyo.org @laroyo
  58. 58. Web & Media Group http://lora-aroyo.org @laroyo crowdtruth.org
  59. 59. Web & Media Group http://lora-aroyo.org @laroyo crowdtruth.org a new approach to understanding semantics harnessing the power & diversity of the crowd
  60. 60. Web & Media Group http://lora-aroyo.org @laroyo crowdtruth.org informs a vector space model of truth instead of a boolean, fuzzy or statistical model
  61. 61. Web & Media Group http://lora-aroyo.org @laroyo observerreferent sign has   triangle of reference (Ogden & Richards, 1923)
  62. 62. Web & Media Group http://lora-aroyo.org @laroyo crowd annotatorannotation example annotation   choices   Lora Aroyo, Chris Welty: The Three Sides of CrowdTruth. J. Human Computation. 1(1). 2014. triangle of reference
  63. 63. Web & Media Group http://lora-aroyo.org @laroyo crowd annotatorannotation example annotation   choices   passionate,   rollicking,   literate,   humorous,  silly,   aggressive,  fiery,   does  not  fit  into   rousing,   cheerful,  fun,   poignant,  wis9ul,   campy,  quirky,   tense,  anxious,   any  of  the  5   confident,   sweet,  amiable,   bi>ersweet,   whimsical,  wi>y,   intense,  vola?le,   clusters   boisterous,   good-­‐natured   autumnal,   wry   visceral       rowdy       brooding               Category 1 Category 3 Category 5 Lora Aroyo, Chris Welty: The Three Sides of CrowdTruth. J. Human Computation. 1(1). 2014. triangle of reference
  64. 64. Web & Media Group http://lora-aroyo.org @laroyo crowd annotatorannotation example annotation   choices   treat   prevent   side  effect   diagnose   cause   other   Treat Cause Prevent Lora Aroyo, Chris Welty: The Three Sides of CrowdTruth. J. Human Computation. 1(1). 2014. ANTIBIOTICS are the first line treatment for indications of TYPHUS. triangle of reference
  65. 65. Web & Media Group http://lora-aroyo.org @laroyo 1 1 1 Worker Vector
  66. 66. Web & Media Group http://lora-aroyo.org @laroyo 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 0 4 3 0 0 5 1 0 Sentence Vector OUR NEW THEORY OF TRUTH
  67. 67. Web & Media Group http://lora-aroyo.org @laroyo Unclear relationship between the two arguments reflected in the disagreement Sentence Clarity Feeling the way the CHEST expands (PALPATION), can identify areas of the lung that are full of fluid. ?PALPATIONIs CHEST related to diagnose location associated with is_a otherpart_of 0 0 02 3 0 0 0 1 0 0 44 1
  68. 68. Web & Media Group http://lora-aroyo.org @laroyo Clearly expressed relation between the two arguments reflected in the agreement Sentence Clarity ?CONJUNCTIVITISHYPERAEMIA related toIs 0 0 0 1 0 0 0 013 0 0 0 0 0 symptomcause Redness (HYPERAEMIA), irritation (chemosis) and watering (epiphora) of the eyes are symptoms common to all forms of CONJUNCTIVITIS.
  69. 69. Web & Media Group http://lora-aroyo.org @laroyo 1 13 4 3 1 7 2 1 6 2 1 1 1 1 1 1 3 3 1 1 8 3 1 5 7 1 1 2 1 Relation Similarity
  70. 70. Web & Media Group http://lora-aroyo.org @laroyo Measures how clearly a sentence expresses a relation 0 1 1 0 0 4 3 0 0 5 1 0 Unit vector for relation R6 Sentence Vector Cosine = .55 confidence Sentence-Relation Score
  71. 71. Web & Media Group http://lora-aroyo.org @laroyo Measured per worker averaged over all sentences (whether worker consistently disagrees with everyone else) Worker-Sentence Disagreement 0 1 1 0 0 4 3 0 0 5 1 0 Worker’s sentence vector Sentence Vector AVG (Cosine) Worker Quality
  72. 72. Web & Media Group http://lora-aroyo.org @laroyo Measured for pairs of workers / sentence averaged over all sentences (identify communities of though) Worker-Worker Disagreement 0 1 1 0 0 1 0 0 0 0 1 0 Worker 1 sentence vector Worker 2 Vector AVG (Cosine) Worker Similarity
  73. 73. Web & Media Group http://lora-aroyo.org @laroyo
  74. 74. Web & Media Group http://lora-aroyo.org @laroyo Example Results
  75. 75. Web & Media Group http://lora-aroyo.org @laroyo Example Results
  76. 76. Web & Media Group http://lora-aroyo.org @laroyo Example Results
  77. 77. Web & Media Group http://lora-aroyo.org @laroyo crowdtruth.org experimental results show that CrowdTruth is a better way of capturing context a more accurate way to predict & explain truth
  78. 78. Web & Media Group http://lora-aroyo.org @laroyo Training a Relation Extraction Classifier F1 Cost per sentence CrowdTruth 0.642 $0.66 Expert Annotator 0.638 $2.00 Single Annotator 0.492 $0.08 “wisdom of the crowd” provides training data that is at least as good if not better than experts only with proper analytic framework for harnessing disagreement from the crowd
  79. 79. Web & Media Group http://lora-aroyo.org @laroyo Learning Curves (crowd with pos./neg. threshold at 0.5 above 400 sent.: crowd consistently over baseline & single above 600 sent.: crowd out-performs experts
  80. 80. Web & Media Group http://lora-aroyo.org @laroyo Learning Curves
  81. 81. Web & Media Group http://lora-aroyo.org @laroyo Take Home Message True  and  False  is  not  enough,     There  is  diversity  in  human  interpreta3on     CrowdTruth  introduces  a  spa3al  representa3on  of   meaning  that  through  crowdsourcing     harnesses  disagreement     Using  CrowdTruth   untrained  workers  can  be  just  as  reliable  as     highly  trained  experts  
  82. 82. Web & Media Group http://lora-aroyo.org @laroyo https://www.youtube.com/watch?v=CyAI_lVUdzM To be AND not to be: quantum intelligence? Lora Aroyo & Chris Welty

×