2014 stsi research_meeting_mturk_pdf

Crowdsourcing for
information extraction:
(dynamic assembly of expert “humans”)
Benjamin Good
The Scripps Research Institute
bgood@scripps.edu
@bgood

High level goal: improve access
to published knowledge
2
articles added to
PubMed per year
>100/hour
Thanks to Suzi Lewis from GO for smoothie

Example use
What diseases are treated with curcumin (turmeric)?
3
Data is
in there,
just hard
to get

4
Kilicoglu H, Shin D, Fiszman M, Rosemblat G, Rindﬂesch TC. SemMedDB: a PubMed-scale
repository of biomedical semantic predications. Bioinformatics. 2012 Dec 1;28(23):
3158-60. doi: 10.1093/bioinformatics/bts591. Epub 2012 Oct 8.
70,364,020 subject-predicate-object relations
NLM tool
24 million abstracts

Example
5
478 results
select * from PREDICATION_AGGREGATE where s_name =
'Curcumin' and predicate = 'TREATS'

Turmeric, the miracle spice!
6

Example
7
478 results
select * from PREDICATION_AGGREGATE where s_name =
'Curcumin' and predicate = 'TREATS'
Data is easy to
access, but is it all
in there?
Is it correct?

9
?!?!
Effect on curcumin on cholesterol gall-stone induction.
Inﬂuence of dietary capsaicin and curcumin during
experimental induction of cholesterol gallstone in mice.
Spice bioactive compounds, capsaicin and curcumin, were
both individually and in combination examined for antilithogenic
potential during experimental induction of cholesterol
gallstones in mice.

10
The diet that contained capsaicin, curcumin, or their
combination reduced the incidence of cholesterol
gallstones by 50%, 66%, and 56%, respectively.

Facts of life in NLP
• False Positives and False Negatives always
present
• Human annotators remain the gold standard
• There are not nearly enough professional human
annotators to process every document
published
11

Observations
• There are about 2.92 billion Internet users
• Lots of them can read English
• Most of these would not have gotten that causal
relation wrong for curcumin…
12 http://www.statista.com/statistics/273018/number-of-internet-users-worldwide/

Hypothesis
• We can generate the equivalent of professional
annotators by incentivizing, guiding, and
aggregating the labor of large numbers of non-
professionals
13 Zhai 2013, Aroyo 2013, Burger 2014, Mortenson 2014, Good 2015

Information Extraction
1. Find mentions of high level concepts in text
2. Map mentions to speciﬁc terms in ontologies
3. Identify relationships between concepts
14

Microtask Crowdsourcing
• Distribute discrete units of work
(aka “human intelligence tasks” or
HITs) to many workers in parallel
who are paid to solve them.
15
Reported 500,000
registered workers in
2011 [1]
[1] Paritosh P, Ipeirotis P, Cooper M, Suri S: The computer is the new sewing
machine: beneﬁts and perils of crowdsourcing. WWW '11 2011:325–326.

AMT, how it works
16
Requester Tasks
AmazonFor each task, specify:
• a qualiﬁcation test
• how many workers per
task
• how much we will pay
per task
• A Web form for
completing the task
Interact directly with
Amazon system
Manages:
• parallel execution of jobs
• worker access to tasks
via qualiﬁcation tests
• payments
• task advertising
Workers

How well can AMT workers, in aggregate,
reproduce a gold standard disease mention
corpus within the text of PubMed abstracts?
17

Corpus used for comparison
NCBI Disease corpus
• 793 PubMed abstracts
• (100 development, 593 training, 100 test)
• 12 expert annotators (2 annotate each abstract)
6,900 “disease” mentions
18
Doğan, Rezarta, and Zhiyong Lu. "An improved corpus of disease mentions in PubMed citations." Proceedings of the 2012
Workshop on Biomedical Natural Language Processing. Association for Computational Linguistics.

“Disease”
Phrase is a disease IF:
• it can be mapped to a unique UMLS metathesaurus
concept in one of these semantic types
19
Doğan, Rezarta, and Zhiyong Lu. "An improved corpus of disease mentions in PubMed citations." Proceedings of the 2012
Workshop on Biomedical Natural Language Processing. Association for Computational Linguistics.
• and it contains information helpful to physicians

20
• Speciﬁc Disease:
• “Diastrophic dysplasia”
• Disease Class:
• “Cancers”
• Composite Mention:
• “prostatic , skin , and lung cancer”
• Modiﬁer:
• ..the “familial breast cancer” gene , BRCA2..
Disease
mentions

Experiment
21
Identify the disease mentions in 593
abstracts from the NCBI disease corpus
• 6 cents per HIT
• HIT = annotate one abstract from PubMed
• First HIT = survey, next 4 = training, then real
• 10% of rest of hits are gold standard tests
• 15 workers annotate each abstract

Instructions
• Task: You will be presented with text from the biomedical literature which we believe may help
resolve some important medical questions. The task is to highlight words and phrases in that
text which are diseases, disease groups, or symptoms of diseases. This work will help
advance research in cancer and many other diseases!
• Highlight all diseases and disease abbreviations
• “...are associated with Huntington disease ( HD )... HD patients
received...”
• “The Wiskott-Aldrich syndrome ( WAS ) , an X-linked
immunodeficiency…”
• Highlight the longest span of text specific to a disease
• “... contains the insulin-dependent diabetes mellitus locus …”
• and not just ‘diabetes’.
• Highlight disease conjunctions as single, long spans.
• “... a significant fraction of familial breast and ovarian cancer patients…”
• Highlight symptoms - physical results of having a disease
• “XFE progeroid syndrome can cause dwarfism, cachexia, and
microcephaly. Patients often display learning disabilities, hearing loss, and
visual impairment.
22

Qualiﬁcation task: Q1
Select all and only the terms that should be
highlighted for each text segment:
23
1. “Myotonic dystrophy ( DM ) is associated with a ( CTG ) n trinucleotide repeat expansion in
the 3-untranslated region of a protein kinase-encoding gene , DMPK , which maps to
chromosome 19q13 . 3 . ”
• Myotonic
• dystrophy
• Myotonic dystrophy
• DM
• CTG
• trinucleotide repeat expansion
• kinase-encoding gene
• DMPK

24
2. “Germline mutations in BRCA1 are responsible for most cases of inherited breast
and ovarian cancer . However , the function of the BRCA1 protein has remained
elusive . As a regulated secretory protein , BRCA1 appears to function by a
mechanism not previously described for tumour suppressor gene products.”
• Germline mutations
• BRCA1
• breast
• ovarian cancer
• inherited breast and ovarian cancer
• cancer
• tumour
• tumour suppressor

25
3. “We report about Dr . Kniest , who first described the condition in 1952 , and his patient ,
who , at the age of 50 years is severely handicapped with short stature , restricted joint
mobility , and blindness but is mentally alert and leads an active life . This is in accordance
with molecular findings in other patients with Kniest dysplasia and…”
• age of 50 years
• severely handicapped
• short
• short stature
• restricted joint mobility
• blindness
• mentally alert
• molecular findings
• Kniest dysplasia
• dysplasia

Qualiﬁcation task results
26
• Experiment ran for 9 days
• 346 workers attempted the qualiﬁcation test
• 145 (42%) passed
Passing
threshold

Worker demographics:
gender
27
0"
0.1"
0.2"
0.3"
0.4"
0.5"
0.6"
0.7"
female" male"
First HIT was a survey

Age
28
0"
0.1"
0.2"
0.3"
0.4"
0.5"
0.6"
0.7"
0.8"
age"18/21" age"21/35" age"36/45" age"46"and"greater"

Occupation
29 0" 0.05" 0.1" 0.15" 0.2" 0.25"
Unemployed"
Student"
Technical"
Science"
Computer"
Business"
Educa=on"
Programmer"
Art"
Re=red"
Labor"
Finance"
Legal"
AEorney"
Team"Leader"
Human"Resources"
stay"at"home"mom"
Biological"Sciences"
Bussiness"
Caretaker"
Administra=ve"Assistant"
microbiology"graduate"student"
Transporta=on"Industry"
sales"
Hardware"
Homemaker"
manufacturing"
Chemical"Sciences"
mom"
Web"Assessor"
Licensed"Prac=cal"Nurse"
customer"service"rep"

Education
30
0" 0.05" 0.1" 0.15" 0.2" 0.25" 0.3"
Some"high"school"
Finished"high"school"
Some"community"college"
Finished"community"college"
Some"49year"college"
Finished"49year"college"
Some"masters"program"
Finished"masters"program"
Some"PhD"program"
Finished"PhD"program"

Tagging interface
32
Click to see instructions
Highlight
mentions

Feedback
interface:
• Game-like
learning signal
• Either see gold
standard data
or data from
other workers
33

Results: quantity, cost
• 9 days
• 589 abstracts annotated by 15 different workers
(8,835 tasks completed)
• 4 hits for training + survey overhead cost
• total cost: $630.96
34

AMT, how it really works
37
Requester
Tasks
Amazon
Aggregation
function
Workers
http://www.thesheepmarket.com/

Increase precision with voting
38
1 or more votes (K=1)
This molecule inhibits the growth of a broad
panel of cancer cell lines, and is particularly
efﬁcacious in leukemia cells, including
orthotopic leukemia preclinical models as well
as in ex vivo acute myeloid leukemia (AML)
and chronic lymphocytic leukemia (CLL)
patient tumor samples. Thus, inhibition of
CDK9 may represent an interesting approach
as a cancer therapeutic target especially in
hematologic malignancies.
K=2
K=3
K=4
Aggregation
function

Results 589 abstracts
compared to gold standard
39
F = 0.87, k = 6

Inter-Annotator agreement among
experts, NCBI Disease corpus
40
Doğan, Rezarta, and Zhiyong Lu. "An improved corpus of disease mentions in PubMed citations." Proceedings of
the 2012 Workshop on Biomedical Natural Language Processing. Association for Computational Linguistics, 2012.
0.76
0.87
Average level
of agreement
between expert
annotators
(stage 1)

Professionals achieve equivalent
agreement only after reviewing each
other’s annotations.
41
0.76
0.87

In aggregate, our worker ensemble is faster,
cheaper and more accurate than a single
expert annotator for this task
• experts had consistency (F) with other experts = 0.76.
• Only after viewing each other’s annotators did experts
reach 0.87 consistency
• The turker ensemble had consistency with the ﬁnalized
standard = 0.87 (with access to much less information)
42

We are not alone
• Mortenson et al (2014), 25 workers, 2¢/task = 1 biomedical
ontology expert. “Using the wisdom of the crowds to ﬁnd critical
errors in biomedical ontologies: a study of SNOMED CT”. JAMIA
• Burger et al (2014). 5 workers, 7¢/task = 1 expert curator.
Hybrid curation of gene–mutation relations combining automated
extraction and crowdsourcing. Database.
• Zhai et al (2013), 5 workers, 3¢/task = 1 expert curator. Web
2.0-Based Crowdsourcing for High-Quality Gold Standard
Development in Clinical Natural Language Processing” J Med
Internet Res
• .. more (e.g. IBM research “Crowd Watson” project by Arroyo
and Welty)

To do list
• Machine learning experiment on TopCoder
• Citizen Science (volunteer) implementation of
this
• New tasks
44

mturk -> machine learning
• The main purpose of building this
particular corpus was to train a
disease tagging algorithm.
45

Next Steps with Disease
Corpus
46
• We have assembled a new
1,000 document corpus
• (took 6 days)
• Simply adding it to the
training data didn’t help
• Execute TopCoder contest
to produce a better
algorithm.

could we just do them all?
• we peaked at a rate of 500 abstracts processed
per day (assuming 5 workers/doc)
• 284 workers contributing in a span of 6 days
• at 1 million/year we would need to get to 2,700/
day to do them all
• $0.066*5*1000000 = $330,000
47

Moving towards $0/task and
many more workers
• mark2cure.org
• A citizen science portal
for volunteers to do the
same stuff
• ﬁrst experiment will
recapitulate results
from AMT
48

Information Extraction
1. Find mentions of high level concepts in text
2. Map mentions to speciﬁc terms in ontologies
3. Identify relationships between concepts
49

50
?!?!
Effect on curcumin on cholesterol gall-stone induction.
Inﬂuence of dietary capsaicin and curcumin during
experimental induction of cholesterol gallstone in mice.
Spice bioactive compounds, capsaicin and curcumin, were
both individually and in combination examined for antilithogenic
potential during experimental induction of cholesterol
gallstones in mice.
70,364,020 subject-predicate-object relations

Thanks
51
Max Nanis Andrew Su
Mechanical Turk Workers!
@bgood
bgood@scripps.edu
Ginger TsuengChunlei Wu

52
Could do well with far fewer workers..

2014 stsi research_meeting_mturk_pdf

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (19)

More from Benjamin Good

More from Benjamin Good (19)

Recently uploaded

Recently uploaded (20)

2014 stsi research_meeting_mturk_pdf