Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

My ESWC 2017 keynote: Disrupting the Semantic Comfort Zone

906 views

Published on

Ambiguity in interpreting signs is not a new idea, yet the vast majority of research in machine interpretation of signals such as speech, language, images, video, audio, etc., tend to ignore ambiguity. This is evidenced by the fact that metrics for quality of machine understanding rely on a ground truth, in which each instance (a sentence, a photo, a sound clip, etc) is assigned a discrete label, or set of labels, and the machine’s prediction for that instance is compared to the label to determine if it is correct. This determination yields the familiar precision, recall, accuracy, and f-measure metrics, but clearly presupposes that this determination can be made. CrowdTruth is a form of collective intelligence based on a vector representation that accommodates diverse interpretation perspectives and encourages human annotators to disagree with each other, in order to expose latent elements such as ambiguity and worker quality. In other words, CrowdTruth assumes that when annotators disagree on how to label an example, it is because the example is ambiguous, the worker isn’t doing the right thing, or the task itself is not clear. In previous work on CrowdTruth, the focus was on how the disagreement signals from low quality workers and from unclear tasks can be isolated. Recently, we observed that disagreement can also signal ambiguity. The basic hypothesis is that, if workers disagree on the correct label for an example, then it will be more difficult for a machine to classify that example. The elaborate data analysis to determine if the source of the disagreement is ambiguity supports our intuition that low clarity signals ambiguity, while high clarity sentences quite obviously express one or more of the target relations. In this talk I will share the experiences and lessons learned on the path to understanding diversity in human interpretation and the ways to capture it as ground truth to enable machines to deal with such diversity.

Published in: Technology

My ESWC 2017 keynote: Disrupting the Semantic Comfort Zone

  1. 1. http://lora-aroyo.org @laroyo Disrupting the Semantic Lora Aroyo Web & Media Group
  2. 2. Web & Media Group http://lora-aroyo.org @laroyo Bulgaria The Netherlands Sofia NYC Personal Semantics
  3. 3. Web & Media Group http://lora-aroyo.org @laroyo Riva del Garda, Italy, 2014 Semantic Social Life
  4. 4. Web & Media Group http://lora-aroyo.org @laroyo 4 To understand the value of Semantic Web for e-learning you have to understand people, e.g. how they learn, interact & consume information
  5. 5. Web & Media Group http://lora-aroyo.org @laroyo 5 To understand the value of Semantic Web for e-learning you have to understand people, e.g. how they interact & consume information
  6. 6. Web & Media Group http://lora-aroyo.org @laroyo 6 To understand the value of Semantic Web for cultural heritage you have to understand people, e.g. how they interact & consume information
  7. 7. Web & Media Group http://lora-aroyo.org @laroyo 7 To understand the value of Semantic Web for cultural heritage you have to understand people, e.g. how they interact & consume information
  8. 8. Web & Media Group http://lora-aroyo.org @laroyo To understand the value of Semantic Web for digital humanities, you have to understand people, e.g. how they interact & consume information
  9. 9. Web & Media Group http://lora-aroyo.org @laroyo people are in the center of everything people & their semantics, i.e. their real-world behavior, online interactions, information needs, information consumption habits, personal preferences ...
  10. 10. Web & Media Group http://lora-aroyo.org @laroyo CrowdTruth team
  11. 11. http://lora-aroyo.org @laroyo Web & Media Group the evolution of the semantic web: great moments from the 1980s to ESWC 2017
  12. 12. http://lora-aroyo.org @laroyo 50’AI more or less begins ...... 80’expert systems 90’knowledge acquisition from experts 00’standards & interoperability 10’big data & large crowds A long time ago in a galaxy far, far away …
  13. 13. http://lora-aroyo.org @laroyo 80’s - empire of the experts
  14. 14. http://lora-aroyo.org @laroyo Advances in hardware and SDEs PCs, workstations, Symbolics, Sun New architectures like the Hypercube LISP, Prolog, OPS AI can now BUILD SYSTEMS Primary focus on experts and rules What is the knowledge of experts What is the form of this knowledge? Graphs, logic, rules, frames How do experts reason? Deduction, induction 80’s - empire of the experts Work on form & process remained academic what happened inside the system, to make the reasoning inside the system proper and as good as possible industry forged ahead with ad-hoc & proprietary systems and actually tried to build expert systems Originals of uncertain KR Fuzzy, probabilistic
  15. 15. http://lora-aroyo.org @laroyo Piero Bonissone and the DELTA/CATS expert system for locomotive repair with David Smith, a locomotive repair expert Buchanan and Shortliff’s MYCIN project at Stanford built an huge rule base for medicat diagnosis working with an extensive team of medical experts.
  16. 16. http://lora-aroyo.org @laroyo 90’s - knowledge acquisition from experts
  17. 17. http://lora-aroyo.org @laroyo
  18. 18. http://lora-aroyo.org @laroyo 90’s - knowledge acquisition from experts The 90’s brought [attention for] knowledge acquisition. Knowing that expert systems by then can functionally work, the focus [in practice as well as scientific research and technology development] shifted to the then-bigger challenge of how to acquire knowledge in real-world scenarios. It seems natural that after the look inside the systems, then one needed to pay attention to how actually get the knowledge from the world outside and frame it into the proper structured knowledge for inside the system. Dream of the 90’s
  19. 19. http://lora-aroyo.org @laroyo
  20. 20. http://lora-aroyo.org @laroyo 00’s - interoperability & standards odyssey
  21. 21. http://lora-aroyo.org @laroyo 10’s - AI Awakens • Machine Learning • Neural networks • Solving basic perceptual problems instead of high-expertise ones • Ambiguity tolerant reasoning • Non-taxonomic ordering → non-taxonomic reasoning • folksonomies, clustering, diversity of perspectives, embeddings
  22. 22. Web & Media Group http://lora-aroyo.org @laroyo 2011
  23. 23. http://lora-aroyo.org @laroyo 10’s – Big Data
  24. 24. Web & Media Group http://lora-aroyo.org @laroyo Human Annotation Central in Machine Learning Training & Evaluation 10’s – Crowds
  25. 25. http://lora-aroyo.org @laroyo Web & Media Group Team BellKor wins Netflix Prize 20071998 2006 2009
  26. 26. Web & Media Group http://lora-aroyo.org @laroyo
  27. 27. Web & Media Group http://lora-aroyo.org @laroyo the semantic comfort zone
  28. 28. Web & Media Group http://lora-aroyo.org @laroyo One truth: knowledge acquisition for the semantic web assumes one correct interpretation for every example All examples are created equal: triples are triples, one is not more important than another, they are all either true or false Disagreement bad: when people disagree, they don’t understand the problem Experts rule: knowledge is captured from domain experts One is enough: knowledge by a single expert is sufficient Detailed explanations help: if examples cause disagreement - add instructions Once done, forever valid: knowledge is not updated; new data not aligned with old “Truth is a Lie: 7 Myths about Human Annotation”, AI Magazine 2014, L. Aroyo, C. Welty
  29. 29. Web & Media Group http://lora-aroyo.org @laroyo Use Case: video archive enrichment Search Behavior of Media Professionals at an Audiovisual Archive: A Transaction Log Analysis (2009). B. Huurnink, L. Hollink, W. van den Heuvel, M. de Rijke.
  30. 30. Web & Media Group http://lora-aroyo.org @laroyo Use Case: video archive enrichment Goal: make the multimedia content of Dutch National Video Archive accessible to large audiences Comfort Zone Solution: media professionals watch & annotate videos. Of course!
  31. 31. Web & Media Group http://lora-aroyo.org @laroyo but ... Expensive Doesn’t scale time-consuming 5 times the video duration professional vocabulary experts use a specific vocabulary that is unknown to general audiences
  32. 32. Web & Media Group http://lora-aroyo.org @laroyo … and people search for fragments experts annotate full videos not finding 35% of search queries result in not found
  33. 33. Web & Media Group http://lora-aroyo.org @laroyo Use Case: real world QA for Watson Crowdsourcing ground truth for Question Answering using CrowdTruth (2015). B Timmermans, L Aroyo, C Welty
  34. 34. Web & Media Group http://lora-aroyo.org @laroyo Goal: gather questions that real people ask for training & evaluating Watson Data: 30K Questions + Candidate Answers. from Yahoo! Answers Comfort Zone Solution: ask people if the passage answers the question (Y/N). Simple! Use Case: real world QA for Watson
  35. 35. Web & Media Group http://lora-aroyo.org @laroyo Contradicting evidence Is Coral a plant? • “Coral almost could be considered half-plant [..]” • “[..] organism, such as a coral, resembling a stony plant.” Unanswerable questions • Can I take a pill if you don't have a child yet? • Is the spelling for being drunk right? • Is napster black? Unclear answer type Is paper animal plant or man made? Multiple right answers to a question What is the best university in NY? (subjective) YES or NO?
  36. 36. Web & Media Group http://lora-aroyo.org @laroyo Use Case: medical relation extraction for Watson Crowdsourcing Ground Truth for Medical Relation Extraction (2017). A Dumitrache, L Aroyo, C Welty
  37. 37. Web & Media Group http://lora-aroyo.org @laroyo Goal: gather data to train Watson to read medical text & automatically extract a medical relations KB Comfort Zone Solution: having medical experts read & annotate examples Use Case: medical relation extraction for Watson
  38. 38. Web & Media Group http://lora-aroyo.org @laroyo ANTIBIOTICS are the first line treatment for indications of TYPHUS. treats(ANTIBIOTICS, TYPHUS)? Expert: yes Patients with TYPHUS who were given ANTIBIOTICS exhibited side-effects. treats(ANTIBIOTICS, TYPHUS)? Expert: yes With ANTIBIOTICS in short supply, DDT was used during WWII to control the insect vectors of TYPHUS. treats(ANTIBIOTICS, TYPHUS)? Expert: yes. Are these three really all the same???
  39. 39. Web & Media Group http://lora-aroyo.org @laroyo Use Case: map music to moods
  40. 40. Web & Media Group http://lora-aroyo.org @laroyo Use Case: map music to moods Goal: annotate songs with emotional tags Comfort Zone Solution: people assign the prevalent mood of a song
  41. 41. Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Other passionate, rollicking, literate, humorous, silly, aggressive, fiery, does not fit into rousing, cheerful, fun, poignant, wistful, campy, quirky, tense, anxious, any of the 5 confident, sweet, amiable, bittersweet, whimsical, witty, intense, volatile, clusters boisterous, good-natured autumnal, wry visceral rowdy brooding Choose one: Which is the mood most appropriate for each song? Goal: (Lee and Hu 2012) 1 song - 1 mood???
  42. 42. Web & Media Group http://lora-aroyo.org @laroyo One truth: knowledge acquisition for the semantic web assumes one correct interpretation for every example All examples are created equal: triples are triples, one is not more important than another, they are all either true or false Disagreement bad: when people disagree, they don’t understand the problem Experts rule: knowledge is captured from domain experts One is enough: knowledge by a single expert is sufficient Detailed explanations help: if examples cause disagreement - add instructions Once done, forever valid: knowledge is not updated; new data not aligned with old “Truth is a Lie: 7 Myths about Human Annotation”, AI Magazine 2014, L. Aroyo, C. Welty
  43. 43. Web & Media Group http://lora-aroyo.org @laroyo One truth: knowledge acquisition for the semantic web assumes one correct interpretation for every example All examples are created equal: triples are triples, one is not more important than another, they are all either true or false Disagreement bad: when people disagree, they don’t understand the problem Experts rule: knowledge is captured from domain experts One is enough: knowledge by a single expert is sufficient Detailed explanations help: if examples cause disagreement - add instructions Once done, forever valid: knowledge is not updated; new data not aligned with old “Truth is a Lie: 7 Myths about Human Annotation”, AI Magazine 2014, L. Aroyo, C. Welty Semantic Comfort Zone
  44. 44. Web & Media Group http://lora-aroyo.org @laroyo One truth: knowledge acquisition for the semantic web assumes one correct interpretation for every example All examples are created equal: triples are triples, one is not more important than another, they are all either true or false Disagreement bad: when people disagree, they don’t understand the problem Experts rule: knowledge is captured from domain experts One is enough: knowledge by a single expert is sufficient Detailed explanations help: if examples cause disagreement - add instructions Once done, forever valid: knowledge is not updated; new data not aligned with old “Truth is a Lie: 7 Myths about Human Annotation”, AI Magazine 2014, L. Aroyo, C. Welty Semantic Comfort Zone disrupted
  45. 45. Web & Media Group http://lora-aroyo.org @laroyo
  46. 46. Web & Media Group http://lora-aroyo.org @laroyo interestingly …
  47. 47. Web & Media Group http://lora-aroyo.org @laroyo • collective decisions of large groups of people • a group of error-prone decision-makers can be surprisingly good at picking the best choice • when thumbs up or thumbs down - the chance of picking the right answer needs to be > 50% • the odds that a most of them will pick the right answer is greater than any of them will pick it on their own • performance gets better as size grows 1785 Marquis de Condorcet “wisdom of crowds”
  48. 48. Web & Media Group http://lora-aroyo.org @laroyo •asked 787 people to guess the weight of an ox •none got the right answer •their collective guess was almost perfect 1906 Sir Francis Galton “wisdom of crowds”
  49. 49. Web & Media Group http://lora-aroyo.org @laroyo WWII Math Rosies 1942: Ballistics calculations and flight trajectories
  50. 50. Web & Media Group http://lora-aroyo.org @laroyo NASA’s Computer Room transcribe raw flight data from celluloid film & oscillograph paper
  51. 51. Web & Media Group http://lora-aroyo.org @laroyo can we harness it?
  52. 52. http://lora-aroyo.org @laroyo Web & Media Group CrowdTruth http://crowdtruth.org/
  53. 53. http://lora-aroyo.org @laroyo Web & Media Group CrowdTruth Three basic causes of disagreement: workers, examples, target semantics Disagreement is signal, not noise. It is indicative of the variation in human semantic interpretation It can indicate ambiguity, vagueness, similarity, over-generality, etc, as well as quality Crowdtruth: Machine-human computation framework for harnessing disagreement in gathering annotated data (2014) O Inel, A Dumitrache, l.Aroyo, C. Welty
  54. 54. Web & Media Group http://lora-aroyo.org @laroyo one truth: multiple truths all examples are created equal: each example is unique disagreement bad: disagreement is good experts rule: crowd rules one is enough: the more the better detailed explanations help: keep it simple stupid once done, forever valid: maintenance is necessary “Truth is a Lie: 7 Myths about Human Annotation”, AI Magazine 2014, L. Aroyo, C. Welty
  55. 55. Web & Media Group http://lora-aroyo.org @laroyo changes needed video archive enrichment improve support for fragment search time-based annotations bridging vocabulary gap between searcher & cataloguer
  56. 56. Web & Media Group http://lora-aroyo.org @laroyo crowdsourcing video tagging two video tagging pilots
  57. 57. Web & Media Group http://lora-aroyo.org @laroyo @waisda http://waisda.nl engage crowds through continuous gaming
  58. 58. http://lora-aroyo.org @laroyo Web & Media Group “On the Role of User-Generated Metadata in A/V Collections”, Riste Gligorov et al. KCAP2011
  59. 59. http://lora-aroyo.org @laroyo Web & Media Group time-based bernhard just “tags” “On the Role of User-Generated Metadata in A/V Collections”, Riste Gligorov et al. KCAP2011
  60. 60. http://lora-aroyo.org @laroyo Web & Media Group objects (57%) westminster abbey abbey priester geestelijken hek paarden tocht aankomst koets kroning mensenmassa parade kroon regen “On the Role of User-Generated Metadata in A/V Collections”, Riste Gligorov et al. KCAP2011
  61. 61. http://lora-aroyo.org @laroyo Web & Media Group persons (31%) bernhard juliana objects (57%) “On the Role of User-Generated Metadata in A/V Collections”, Riste Gligorov et al. KCAP2011
  62. 62. http://lora-aroyo.org @laroyo Web & Media Group user vocabulary 8% in professional vocabulary 23% in Dutch lexicon 89% found on Google locations (7%) engeland locations (7%) persons (31%) objects (57%) “On the Role of User-Generated Metadata in A/V Collections”, Riste Gligorov et al. KCAP2011
  63. 63. http://lora-aroyo.org @laroyo Web & Media Group user vocabulary 8% in professional vocabulary 23% in Dutch lexicon 89% found on Google locations (7%) describe mainly short segments often not very specific don’t describe programmes as a whole “On the Role of User-Generated Metadata in A/V Collections”, Riste Gligorov et al. KCAP2011 user vocabulary 8% in professional vocabulary 23% in Dutch lexicon 89% found on Google
  64. 64. Web & Media Group http://lora-aroyo.org @laroyo crowdsourcing medical relation extraction diversity of opinions independent perspectives multitude of contexts we exposed a richer set of possibilities that help in identifying, processing & understanding context
  65. 65. Web & Media Group http://lora-aroyo.org @laroyo Does this sentence express TREATS(Antibiotics, Typhus)? Patients with TYPHUS who were given ANTIBIOTICS exhibited several side-effects. With ANTIBIOTICS in short supply, DDT was used during World War II to control the insect vectors of TYPHUS. ANTIBIOTICS are the first line treatment for indications of TYPHUS. 95% 75% 50% The crowd results captures the natural ambiguity
  66. 66. http://lora-aroyo.org @laroyo Web & Media Group What is the relation between the highlighted terms? He was the first physician to identify the relationship between HEMOPHILIA and HEMOPHILIC ARTHROPATHY. Experts Hallucinate Crowd reads text literally - provide better examples to machine experts: cause crowd: no relation
  67. 67. http://lora-aroyo.org @laroyo Web & Media Group Unclear relationship between the two arguments reflected in the disagreement Medical Relation Extraction
  68. 68. http://lora-aroyo.org @laroyo Web & Media Group Clearly expressed relation between the two arguments reflected in the agreement Medical Relation Extraction
  69. 69. http://lora-aroyo.org @laroyo Web & Media Group Unclear relationship between the two arguments reflected in the disagreement Medical Relation Extraction
  70. 70. http://lora-aroyo.org @laroyo Web & Media Group
  71. 71. http://lora-aroyo.org @laroyo Web & Media Group Learning Curves (crowd with pos./neg. threshold at 0.5) above 400 sent.: crowd consistently over baseline & single above 600 sent.: crowd out-performs experts
  72. 72. http://lora-aroyo.org @laroyo Web & Media Group Learning Curves Extended (crowd with pos./neg. threshold at 0.5) crowd consistently performs better than baseline
  73. 73. http://lora-aroyo.org @laroyo Web & Media Group # of Workers: Impact on Sentence-Relation Score
  74. 74. Web & Media Group http://lora-aroyo.org @laroyo Training a Relation Extraction Classifier F1 Cost per sentence CrowdTruth 0.642 $0.66 Expert Annotator 0.638 $2.00 Single Annotator 0.492 $0.08 “wisdom of the crowd” provides training data that is at least as good if not better than experts only with proper analytic framework for harnessing disagreement from the crowd
  75. 75. http://lora-aroyo.org @laroyo Web & Media Group map music to moods Goal: tag songs with emotional clusters Comfort Zone Solution: people assign the prevalent mood of a song
  76. 76. Web & Media Group http://lora-aroyo.org @laroyo
  77. 77. Web & Media Group http://lora-aroyo.org @laroyo Is this song …. ?Passionate Rousing Confident Boisterous Rowdy Literate Poignant Wistful Bittersweet Autumnal Brooding Rollicking Cheerful Fun Sweet Amiable Good-natured Humorous Silly Campy Whimsical Witty Wry Aggressive Fiery Tense Anxious Intense Volatile
  78. 78. Web & Media Group http://lora-aroyo.org @laroyo If “One Truth” & “No Disagreement” Worker Mood-C1 Mood-C2 Mood-C3 Mood-C4 Mood-C5 W1 1 W2 1 W3 1 W4 1 W5 1 W6 1 W7 W8 W9 1 W10 1 Totals 1 3 1 2 1
  79. 79. Web & Media Group http://lora-aroyo.org @laroyo Worker Mood-C1 Mood-C2 Mood-C3 Mood-C4 Mood-C5 Other W1 1 1 1 W2 1 1 1 W3 1 1 1 W4 1 1 W5 1 1 W6 1 1 1 W7 1 1 1 W8 1 1 1 W9 1 1 W10 1 1 1 1 1 Totals 3 5 6 5 2 8 If “Many Truths” & “Disagreement”
  80. 80. Web & Media Group http://lora-aroyo.org @laroyo can indicate alternative interpretations Worker Mood-C1 Mood-C2 Mood-C3 Mood-C4 Mood-C5 Other W10 1 1 1 1 1 Totals 3 5 6 5 2 8 Disagreement as Signal can indicate ambiguity in the categorisation can indicate low quality workers
  81. 81. http://lora-aroyo.org @laroyo so …
  82. 82. http://lora-aroyo.org @laroyo getting comfortable again
  83. 83. http://lora-aroyo.org @laroyo Take Home Message People first, experts second True and False is not enough, There is diversity in human interpretation CrowdTruth introduces a spatial representation of meaning that harnesses disagreement With CrowdTruth untrained workers can be just as reliable as highly trained experts
  84. 84. http://lora-aroyo.org @laroyo http://data.crowdtruth.org/

×