Professor Human Computer Interaction at Vrije Universiteit Amsterdam / VU
May. 26, 2014•0 likes•28,037 views
1 of 40
Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities projects 2014
May. 26, 2014•0 likes•28,037 views
Download to read offline
Report
Technology
Health & Medicine
This presentation was given at the NL eSchience Center during the "De Geest Uit De Fles" event for the kick off of eHumanities project in 2014:
http://esciencecenter.nl/agenda/703-26-may-de-geest-uit-de-fles/
Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities projects 2014
1. Crowds & Niches Teaching Machines to Diagnose
Crowds & Niches
Teaching Machines to Diagnose
Lora Aroyo
2. Crowds Niches Teaching Machines to Diagnose
IBM Confidential
• Open Domain Question-Answering Machine, that given
– Rich Natural Language Questions
– Over a Broad Domain of Knowledge
• Won a 2-game Jeopardy match against the all-time winners
– viewed by over 50,000,000
3. Crowds Niches Teaching Machines to Diagnose
Watson MD
• Adapt Watson to Medical QA
• Mainly an NLP task
• Cognitive computing systems need human-
annotated data for training, testing, evaluation
the human annotation task is one of semantic
interpretation
Now answering
medical questions!
4. Crowds Niches Teaching Machines to Diagnose
Gadolinium agents are useful for patients with renal
impairment, but in patients with severe renal failure
requiring dialysis it presents a risk of nephrogenic
systemic fibrosis.
Mention detection: find the spans (begin, end) of relevant medical
terms (factors) in a passage.
Factor Typing: find the type of each mention
substance disorder
disorder
NER
disorder
treatment
NLP Tasks
5. Crowds Niches Teaching Machines to Diagnose
NLP Tasks
Gadolinium agents are useful for patients with renal
impairment, but in patients with severe renal failure
requiring dialysis it presents a risk of nephrogenic
systemic fibrosis.
Mention detection: find the spans (begin, end) of relevant medical terms
(factors) in a passage.
Factor Typing: find the type of each mention
Factor (Entity) Identification: find the corresponding ids for a mentioned
factor in a knowledge-base
C0016911
C1408325
C0035078
C1619692
C0019004
NLP Tasks
6. Crowds Niches Teaching Machines to Diagnose
Gadolinium agents are useful for patients with renal
impairment, but in patients with severe renal failure
requiring dialysis it presents a risk of nephrogenic
systemic fibrosis.
Mention detection: find the spans (begin, end) of relevant medical terms
(factors) in a passage.
Factor Typing: find the type of each mention
Factor (Entity) Identification: find the corresponding ids for a mentioned
factor in a knowledge-base
Relation detection: find relations that are expressed in a passage between
factors?
cause
treats
treats
contra-
indicates
NLP Tasks
7. Crowds Niches Teaching Machines to Diagnose
NLP Tasks
Gadolinium agents are useful for patients with renal
impairment, but in patients with severe renal failure
requiring dialysis it presents a risk of nephrogenic
systemic fibrosis.
Mention detection: find the spans (begin, end) of relevant medical terms
(factors) in a passage.
Factor Typing: find the type of each mention
Factor (Entity) Identification: find the corresponding ids for a mentioned
factor in a knowledge-base
Relation detection: find relations that are expressed in a passage between
factors?
Coreference: Find the mentions in a sentence that refer to the same factor.
8. Crowds Niches Teaching Machines to Diagnose
Gold Standard
Assumption
• Cognitive systems need to be told what is right what is wrong
• A gold standard or ground truth
• Performance is measured on test sets vetted by human experts à
never perfect, always improving against test data
• Historically, gold standards are created assuming that for each annotated
instance there is a single right answer
• Gold standard quality is measured in inter-annotator agreement à
does not account for perspectives, for reasonable alternative interpretations
9. Crowds Niches Teaching Machines to Diagnose
but people don’t always agree…
10. Crowds Niches Teaching Machines to Diagnose
Disagreement
Gadolinium agents are useful for patients with renal
impairment, but in patients with severe renal failure
requiring dialysis there is a risk of nephrogenic
systemic fibrosis.
cause
11. Crowds Niches Teaching Machines to Diagnose
Gadolinium agents are useful for patients with renal
impairment, but in patients with severe renal failure
requiring dialysis there is a risk of nephrogenic
systemic fibrosis.
side-effect
The human annotation task is one
of semantic interpretation
Disagreement
12. Crowds Niches Teaching Machines to Diagnose
Position
maybe this disagreement is a signal and not noise?
can we harness it?
13. Crowds Niches Teaching Machines to Diagnose
Key Question
How do we represent
measure disagreement in a
way that it can be harnessed?
14. Crowds Niches Teaching Machines to Diagnose
Crowd Truth
Annotator disagreement is signal,
not noise.
It is indicative of the variation in
human semantic interpretation of
signs, and can indicate ambiguity,
vagueness, over-generality, etc.
http://www.freefoto.com/preview/01-47-44/Flock-of-Birds
15. Crowds Niches Teaching Machines to Diagnose
Position
symbiosis between humans machines
machines learn from humans machine help humans
16. Crowds Niches Teaching Machines to Diagnose
Crowd Truth Framework
17. Crowds Niches Teaching Machines to Diagnose
Human-Machine Workflows
18. Crowds Niches Teaching Machines to Diagnose
Relation Extraction
Crowdsourcing Ground Truth Data: CrowdTruth
Relations overlap in meaning
Sentences are vague and ambiguous
Experts have different interpretations
20. Crowds Niches Teaching Machines to Diagnose
Representation
Worker Vector
1
1
1
Gadolinium agents are useful for patients with renal
impairment, but in patients with severe renal failure
requiring dialysis there is a risk of nephrogenic systemic
fibrosis.
22. Crowds Niches Teaching Machines to Diagnose
Feeling the way the CHEST expands (PALPATION), can identify areas of
the lung that are full of fluid.
?PALPATIONIs CHEST related to
diagnose location associated
with
is_a otherpart_of
0 0 02 3 0 0 0 1 0 0 44 1
Disagreement for
Sentence Clarity
Unclear relationship between the two arguments
reflected in the disagreement
23. Crowds Niches Teaching Machines to Diagnose
?CONJUNCTIVITISHYPERAEMIA related toIs
0 0 0 1 0 0 0 013 0 0 0 0 0
symptomcause
Redness (HYPERAEMIA), irritation (chemosis) and watering (epiphora)
of the eyes are symptoms common to all forms of CONJUNCTIVITIS.
Disagreement for
Sentence Clarity
Clearly expressed relation between the two
arguments reflected in the agreement
24. Crowds Niches Teaching Machines to Diagnose
Sentence-Relation
Score
Measures how clearly a sentence
expresses a relation
0
1
1
0
0
4
3
0
0
5
1
0
Unit vector for
relation R6
Sentence
Vector
Cosine = .55
26. Crowds Niches Teaching Machines to Diagnose
Crowd Truth Metrics
Relation Extraction
Three parts to understand human interpretations:
§ Sentence
• How good is a sentence for relation extraction task?
§ Workers
• How well does a worker understand the sentence?
§ Relations
• Is the meaning of the relation clear?
• How ambiguous/confusable is it?
27. Crowds Niches Teaching Machines to Diagnose
Human-Machine Workflows
28. Crowds Niches Teaching Machines to Diagnose
Crowdtruth.org
29. Crowds Niches Teaching Machines to Diagnose
Crowdtruth.org
30. Crowds Niches Teaching Machines to Diagnose
Provenance of Crowdsourcing
31. Crowds Niches Teaching Machines to Diagnose
Watson MD
• Not every task is suitable for
lay crowd, some require
domain expertise
• Domain experts are busy
• How to get them motivated
to perform annotation tasks?
• How to make it efficient for
them and effective for
annotations?
Crowd vs. Experts
32. Crowds Niches Teaching Machines to Diagnose
Dr. Watson Experts Game
33. Crowds Niches Teaching Machines to Diagnose
Dr. Watson Experts Game
34. Crowds Niches Teaching Machines to Diagnose
Dr. Watson Experts Game
35. Crowds Niches Teaching Machines to Diagnose
Dr. Watson Experts Game
36. Crowds Niches Teaching Machines to Diagnose
Dr. Watson Experts Game
37. Crowds Niches Teaching Machines to Diagnose
Dr. Watson Experts Game
38. Crowds Niches Teaching Machines to Diagnose
• Experimenting with:
• different domains, e.g. art, history, news
• different formats, e.g. text, images, videos
• different annotation tasks, e.g.
• medical factors, relations, synonyms, negation
• events, event types, participants, locations
• flowers, birds
• Integrating crowds from mTurk and CrowdFlower
with domain experts from Dr. Detective, Waisda? and
Accurator
Domain Independent
39. Crowds Niches Teaching Machines to Diagnose
The Crew
• Lora Aroyo (VU)
• Chris Welty (IBM)
• Robert-Jan Sips (IBM)
• Anca Dumitrache (VU)
• Oana Inel (VU)
• Khalid Khamkham (VU)
• Tatiana Cristea (VU)
• Rens v. Honschooten (VU)
• Benjamin Timmermans (VU)
• Harriëtte Smook (VU)
• Arne Rutjes (IBM)
• Jelle van der Ploeg (IBM)
40. Crowds Niches Teaching Machines to Diagnose
http://crowdtruth.org