Microtask crowdsourcing for
disease mention annotation
in PubMed abstracts
Benjamin Good, Max Nanis, Andrew Su
The Scripps...
• Rapid growth of text
Long term goal: improve
information extraction from text
2
• Existing computational
methods
- are n...
Information Extraction
1. Find mentions of high level concepts in text
2. Map mentions to specific terms in ontologies
3. I...
Crowdsourcing
There is accumulating evidence that many
non-expert members of ‘the crowd’ can
read English well enough to h...
Microtask Crowdsourcing
• Distribute discrete units of work
(aka “human intelligence tasks” or
HITs) to many workers in pa...
AMT, how it works
6
Requester Tasks
Amazon
For each task, specify:
• a qualification test
• how many workers per
task
• how...
How well can AMT workers, in aggregate,
reproduce a gold standard disease mention
corpus within the text of PubMed abstrac...
Corpus used for comparison
NCBI Disease corpus
• 793 PubMed abstracts
• (100 development, 593 training, 100 test)
• 12 exp...
Disease
Phrase is a disease IF:
• it can be mapped to a unique UMLS metathesaurus
concept in one of these semantic types
9...
10
• Specific Disease:
• “Diastrophic dysplasia”
• Disease Class:
• “Cancers”
• Composite Mention:
• “prostatic , skin , an...
Instructions
• Task: You will be presented with text from the biomedical literature which we believe may help
resolve some...
Qualification task: Q1
Select all and only the terms that should be
highlighted for each text segment:
12
1. “Myotonic dyst...
Qualification task: Q2
13
2. “Germline mutations in BRCA1 are responsible for most cases of inherited breast
and ovarian ca...
Qualification task: Q3
14
3. “We report about Dr . Kniest , who first described the condition in 1952 , and his patient ,
wh...
Qualification task results
15
Threshold
for passing
33/194 passed
17%
Workers
qualified
workers
Tagging interface
16
Click to see instructions
Highlight
mentions
Experiment
17
Identify the disease mentions in the 593
abstracts from the NCBI disease corpus
• 6 cents per HIT
• HIT = an...
AMT, how it really works
18
Requester
Tasks
Amazon
Aggregation
function
Workers
http://www.thesheepmarket.com/
Increase precision with voting
19
1 or more votes (K=1)
This molecule inhibits the growth of a broad
panel of cancer cell ...
Results 593 abstracts
compared to gold standard
• 7 days
• $192.90
• 17 workers
20
F = 0.81, k = 2
Inter-Annotator agreement among
experts, NCBI Disease corpus
21
Doğan, Rezarta, and Zhiyong Lu. "An improved corpus of dis...
In aggregate, our worker ensemble is faster,
cheaper and as accurate as a single expert
annotator for this task
• experts ...
Summary
• Some members of the crowd can tag “disease”
mentions in PubMed abstracts with comparable
accuracy to experts
• T...
The Future
• It looks like, if we want to, we can have access
to much larger sets of annotated corpora than
ever before
• ...
Thanks
25
Max Nanis Andrew Su
Mechanical Turk Workers!
@bgood
bgood@scipps.edu
Try it yourself!
• GATE crowdsourcing plugin.
http://gate.ac.uk/wiki/crowdsourcing.html
• Or you can try our code at
https...
Clarification…
• This is NOT a replacement for
professional annotators
• This IS a tool that could be used by
professional ...
Related work
• [1] Zhai et al 2013, used similar protocol to tag medication
names in clinical trials descriptions. F = 0.8...
Upcoming SlideShare
Loading in …5
×

Microtask crowdsourcing for disease mention annotation in PubMed abstracts

1,722
-1

Published on

Microtask crowdsourcing for disease mention annotation in PubMed abstracts
Benjamin M. Good, Max Nanis, Andrew I. Su

Identifying concepts and relationships in biomedical text enables knowledge to be applied in computational analyses that would otherwise be impossible. As a result, many biological natural language processing (BioNLP) projects attempt to address this challenge. However, the state of the art in BioNLP still leaves much room for improvement in terms of precision, recall and the complexity of knowledge structures that can be extracted automatically. Expert curators are vital to the process of knowledge extraction but are always in short supply. Recent studies have shown that workers on microtasking platforms such as Amazon’s Mechanical Turk (AMT) can, in aggregate, generate high-quality annotations of biomedical text.

Here, we investigated the use of the AMT in capturing disease mentions in Pubmed abstracts. We used the recently published NCBI Disease corpus as a gold standard for refining and benchmarking the crowdsourcing protocol. After merging the responses from 5 AMT workers per abstract with a simple voting scheme, we were able to achieve a maximum f measure of 0.815 (precision 0.823, recall 0.807) over 593 abstracts as compared to the NCBI annotations on the same abstracts. Comparisons were based on exact matches to annotation spans. The results can also be tuned to optimize for precision (max = 0.98 when recall = 0.23) or recall (max = 0.89 when precision = 0.45). It took 7 days and cost $192.90 to complete all 593 abstracts considered here (at $.06/abstract with 50 additional abstracts used for spam detection).

This experiment demonstrated that microtask-based crowdsourcing can be applied to the disease mention recognition problem in the text of biomedical research articles. The f-measure of 0.815 indicates that there is room for improvement in the crowdsourcing protocol but that, overall, AMT workers are clearly capable of performing this annotation task.

Published in: Science
1 Comment
4 Likes
Statistics
Notes
  • Really interesting guys, thanks​!​

    I think that you would be really interested in some of the most cutting-edge research that I have come across explaining crowds, open innovation, and citizen science.​

    http://papers.ssrn.com/sol3/cf_dev/AbsByAuth.cfm?per_id=1919614

    And you may also enjoy this blog about the same too:
    https://thecrowdsociety.jux.com/

    Powerful stuff, no?
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
1,722
On Slideshare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
12
Comments
1
Likes
4
Embeds 0
No embeds

No notes for slide

Microtask crowdsourcing for disease mention annotation in PubMed abstracts

  1. 1. Microtask crowdsourcing for disease mention annotation in PubMed abstracts Benjamin Good, Max Nanis, Andrew Su The Scripps Research Institute @bgood
  2. 2. • Rapid growth of text Long term goal: improve information extraction from text 2 • Existing computational methods - are not perfect - need training data pubs/year >100/hour
  3. 3. Information Extraction 1. Find mentions of high level concepts in text 2. Map mentions to specific terms in ontologies 3. Identify relationships between concepts 3
  4. 4. Crowdsourcing There is accumulating evidence that many non-expert members of ‘the crowd’ can read English well enough to help with many information extraction tasks - even in complex biomedical text 4 Zhai 2013, Aroyo 2013, Burger 2014
  5. 5. Microtask Crowdsourcing • Distribute discrete units of work (aka “human intelligence tasks” or HITs) to many workers in parallel who are paid to solve them. 5 Reported 500,000 registered workers in 2011 [1] [1] Paritosh P, Ipeirotis P, Cooper M, Suri S: The computer is the new sewing machine: benefits and perils of crowdsourcing. WWW '11 2011:325–326.
  6. 6. AMT, how it works 6 Requester Tasks Amazon For each task, specify: • a qualification test • how many workers per task • how much we will pay per task • in this case, a link to a website that we host where they can complete the task. Interact directly with Amazon system Manages: • parallel execution of jobs • worker access to tasks via qualification tests • payments • task advertising Workers
  7. 7. How well can AMT workers, in aggregate, reproduce a gold standard disease mention corpus within the text of PubMed abstracts? 7
  8. 8. Corpus used for comparison NCBI Disease corpus • 793 PubMed abstracts • (100 development, 593 training, 100 test) • 12 expert annotators (2 annotate each abstract) 6,900 “disease” mentions 8 Doğan, Rezarta, and Zhiyong Lu. "An improved corpus of disease mentions in PubMed citations." Proceedings of the 2012 Workshop on Biomedical Natural Language Processing. Association for Computational Linguistics.
  9. 9. Disease Phrase is a disease IF: • it can be mapped to a unique UMLS metathesaurus concept in one of these semantic types 9 Doğan, Rezarta, and Zhiyong Lu. "An improved corpus of disease mentions in PubMed citations." Proceedings of the 2012 Workshop on Biomedical Natural Language Processing. Association for Computational Linguistics. • and it contains information helpful to physicians
  10. 10. 10 • Specific Disease: • “Diastrophic dysplasia” • Disease Class: • “Cancers” • Composite Mention: • “prostatic , skin , and lung cancer” • Modifier: • ..the “familial breast cancer” gene , BRCA2.. Disease mentions
  11. 11. Instructions • Task: You will be presented with text from the biomedical literature which we believe may help resolve some important medical questions. The task is to highlight words and phrases in that text which are diseases, disease groups, or symptoms of diseases. This work will help advance research in cancer and many other diseases! • Highlight all diseases and disease abbreviations ! • “...are associated with Huntington disease ( HD )... HD patients received...” • “The Wiskott-Aldrich syndrome ( WAS ) , an X-linked immunodeficiency…” • Highlight the longest span of text specific to a disease ! • “... contains the insulin-dependent diabetes mellitus locus …” • and not just ‘diabetes’. • Highlight disease conjunctions as single, long spans. • “... a significant fraction of familial breast and ovarian cancer , but undergoes…” • Highlight symptoms - physical results of having a disease! • “XFE progeroid syndrome can cause dwarfism, cachexia, and microcephaly. Patients often display learning disabilities, hearing loss, and visual impairment. 11
  12. 12. Qualification task: Q1 Select all and only the terms that should be highlighted for each text segment: 12 1. “Myotonic dystrophy ( DM ) is associated with a ( CTG ) n trinucleotide repeat expansion in the 3-untranslated region of a protein kinase-encoding gene , DMPK , which maps to chromosome 19q13 . 3 . ” • Myotonic • dystrophy • Myotonic dystrophy • DM • CTG • trinucleotide repeat expansion • kinase-encoding gene • DMPK
  13. 13. Qualification task: Q2 13 2. “Germline mutations in BRCA1 are responsible for most cases of inherited breast and ovarian cancer . However , the function of the BRCA1 protein has remained elusive . As a regulated secretory protein , BRCA1 appears to function by a mechanism not previously described for tumour suppressor gene products.” • Germline mutations • BRCA1 • breast • ovarian cancer • inherited breast and ovarian cancer • cancer • tumour • tumour suppressor
  14. 14. Qualification task: Q3 14 3. “We report about Dr . Kniest , who first described the condition in 1952 , and his patient , who , at the age of 50 years is severely handicapped with short stature , restricted joint mobility , and blindness but is mentally alert and leads an active life . This is in accordance with molecular findings in other patients with Kniest dysplasia and…” • age of 50 years • severely handicapped • short • short stature • restricted joint mobility • blindness • mentally alert • molecular findings • Kniest dysplasia • dysplasia
  15. 15. Qualification task results 15 Threshold for passing 33/194 passed 17% Workers qualified workers
  16. 16. Tagging interface 16 Click to see instructions Highlight mentions
  17. 17. Experiment 17 Identify the disease mentions in the 593 abstracts from the NCBI disease corpus • 6 cents per HIT • HIT = annotate one abstract from PubMed • 5 workers annotate each abstract
  18. 18. AMT, how it really works 18 Requester Tasks Amazon Aggregation function Workers http://www.thesheepmarket.com/
  19. 19. Increase precision with voting 19 1 or more votes (K=1) This molecule inhibits the growth of a broad panel of cancer cell lines, and is particularly efficacious in leukemia cells, including orthotopic leukemia preclinical models as well as in ex vivo acute myeloid leukemia (AML) and chronic lymphocytic leukemia (CLL) patient tumor samples. Thus, inhibition of CDK9 may represent an interesting approach as a cancer therapeutic target especially in hematologic malignancies. K=2 This molecule inhibits the growth of a broad panel of cancer cell lines, and is particularly efficacious in leukemia cells, including orthotopic leukemia preclinical models as well as in ex vivo acute myeloid leukemia (AML) and chronic lymphocytic leukemia (CLL) patient tumor samples. Thus, inhibition of CDK9 may represent an interesting approach as a cancer therapeutic target especially in hematologic malignancies. K=3 This molecule inhibits the growth of a broad panel of cancer cell lines, and is particularly efficacious in leukemia cells, including orthotopic leukemia preclinical models as well as in ex vivo acute myeloid leukemia (AML) and chronic lymphocytic leukemia (CLL) patient tumor samples. Thus, inhibition of CDK9 may represent an interesting approach as a cancer therapeutic target especially in hematologic malignancies. K=4 This molecule inhibits the growth of a broad panel of cancer cell lines, and is particularly efficacious in leukemia cells, including orthotopic leukemia preclinical models as well as in ex vivo acute myeloid leukemia (AML) and chronic lymphocytic leukemia (CLL) patient tumor samples. Thus, inhibition of CDK9 may represent an interesting approach as a cancer therapeutic target especially in hematologic malignancies. Aggregation function
  20. 20. Results 593 abstracts compared to gold standard • 7 days • $192.90 • 17 workers 20 F = 0.81, k = 2
  21. 21. Inter-Annotator agreement among experts, NCBI Disease corpus 21 Doğan, Rezarta, and Zhiyong Lu. "An improved corpus of disease mentions in PubMed citations." Proceedings of the 2012 Workshop on Biomedical Natural Language Processing. Association for Computational Linguistics, 2012. 0.76 0.87 Average level of agreement between expert annotators (stage 1)
  22. 22. In aggregate, our worker ensemble is faster, cheaper and as accurate as a single expert annotator for this task • experts had consistency (F) with other experts = 0.76. • The turker ensemble had consistency with the finalized standard = 0.81 22
  23. 23. Summary • Some members of the crowd can tag “disease” mentions in PubMed abstracts with comparable accuracy to experts • This was nontrivial to set up • We can now generate disease mention annotations at a rate of about 500 abstracts and $150 per week • Next step: mentions to concepts… 23
  24. 24. The Future • It looks like, if we want to, we can have access to much larger sets of annotated corpora than ever before • The annotations are different • New ways of using and evaluating IE algorithms are needed [1]. 24 [1] Aroyo, Lora, and Chris Welty. Harnessing disagreement in crowdsourcing a relation extraction gold standard. Tech. Rep. RC25371 (WAT1304-058), IBM Research, 2013.
  25. 25. Thanks 25 Max Nanis Andrew Su Mechanical Turk Workers! @bgood bgood@scipps.edu
  26. 26. Try it yourself! • GATE crowdsourcing plugin. http://gate.ac.uk/wiki/crowdsourcing.html • Or you can try our code at https://bitbucket.org/sulab/mark2cure/ ! • And present your findings at the crowdsourcing session at the Pacific Symposium on Biocomputing January 2015, Big Island, Hawaii 26
  27. 27. Clarification… • This is NOT a replacement for professional annotators • This IS a tool that could be used by professional annotators 27
  28. 28. Related work • [1] Zhai et al 2013, used similar protocol to tag medication names in clinical trials descriptions. F = 0.88 compared to gold standard • [2] Burger et al, using microtask workers to identify relationships between genes and mutations. • [3] Aroyo & Welty, used workers to identify relations between concepts in medical text. 28 [1] Zhai H. et al (2013) ”Web 2.0-Based Crowdsourcing for High-Quality Gold Standard Development in Clinical Natural Language Processing” J Med Internet Res [2] Burger, John, et al. (2014) "Hybrid curation of gene-mutation relations combining automated extraction and crowdsourcing.” Mitre technical report [3] Aroyo, Lora, and Chris Welty. Harnessing disagreement in crowdsourcing a relation extraction gold standard. Tech. Rep. RC25371 (WAT1304-058), IBM Research, 2013.
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×