Microtask crowdsourcing for disease mention annotation in PubMed abstracts
Identifying concepts and relationships in biome...
Upcoming SlideShare
Loading in …5
×

Poster: Microtask crowdsourcing for disease mention annotation in PubMed abstracts

932 views

Published on

Benjamin M. Good, Max Nanis, Andrew I. Su

Identifying concepts and relationships in biomedical text enables knowledge to be applied in computational analyses that would otherwise be impossible. As a result, many biological natural language processing (BioNLP) projects attempt to address this challenge. However, the state of the art in BioNLP still leaves much room for improvement in terms of precision, recall and the complexity of knowledge structures that can be extracted automatically. Expert curators are vital to the process of knowledge extraction but are always in short supply. Recent studies have shown that workers on microtasking platforms such as Amazon’s Mechanical Turk (AMT) can, in aggregate, generate high-quality annotations of biomedical text.

Here, we investigated the use of the AMT in capturing disease mentions in Pubmed abstracts. We used the recently published NCBI Disease corpus as a gold standard for refining and benchmarking the crowdsourcing protocol. After merging the responses from 5 AMT workers per abstract with a simple voting scheme, we were able to achieve a maximum f measure of 0.815 (precision 0.823, recall 0.807) over 593 abstracts as compared to the NCBI annotations on the same abstracts. Comparisons were based on exact matches to annotation spans. The results can also be tuned to optimize for precision (max = 0.98 when recall = 0.23) or recall (max = 0.89 when precision = 0.45). It took 7 days and cost $192.90 to complete all 593 abstracts considered here (at $.06/abstract with 50 additional abstracts used for spam detection).

This experiment demonstrated that microtask-based crowdsourcing can be applied to the disease mention recognition problem in the text of biomedical research articles. The f-measure of 0.815 indicates that there is room for improvement in the crowdsourcing protocol but that, overall, AMT workers are clearly capable of performing this annotation task.

Published in: Science
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
932
On SlideShare
0
From Embeds
0
Number of Embeds
31
Actions
Shares
0
Downloads
7
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Poster: Microtask crowdsourcing for disease mention annotation in PubMed abstracts

  1. 1. Microtask crowdsourcing for disease mention annotation in PubMed abstracts Identifying concepts and relationships in biomedical text enables knowledge to be applied in computational analyses that would otherwise be impossible. As a result, many biological natural language processing (BioNLP) projects attempt to address this challenge. However, the state of the art in BioNLP still leaves much room for improvement in terms of precision, recall and the complexity of knowledge structures that can be extracted automatically. Expert curators are vital to the process of knowledge extraction but are always in short supply. Recent studies have shown that workers on microtasking platforms such as Amazon’s Mechanical Turk (AMT) can, in aggregate, generate high-quality annotations of biomedical text [1]. Here, we investigated the use of the AMT in capturing disease mentions in PubMed abstracts. We used the recently published NCBI Disease corpus as a gold standard for refining and benchmarking the crowdsourcing protocol. After merging the responses from 5 AMT workers per abstract with a simple voting scheme, we were able to achieve a maximum f measure of 0.815 (precision 0.823, recall 0.807) over 593 abstracts as compared to the NCBI annotations on the same abstracts. Comparisons were based on exact matches to annotation spans. The results can also be tuned to optimize for precision (max = 0.98 when recall = 0.23) or recall (max = 0.89 when precision = 0.45). It took 7 days and cost $192.90 to complete all 593 abstracts considered here (at $.06/abstract with 50 additional abstracts used for spam detection). This experiment demonstrated that microtask-based crowdsourcing can be applied to the disease mention recognition problem in the text of biomedical research articles. The f-measure of 0.815 indicates that there is room for improvement in the crowdsourcing protocol but that, overall, AMT workers are clearly capable of performing this annotation task. ABSTRACT Benjamin M Good, Max Nanis, Andrew I Su The Scripps Research Institute, La Jolla, California, USA REFERENCES We acknowledge support from the National Institute of General Medical Sciences (GM089820 and GM083924). CONTACT Benjamin Good: bgood@scripps.edu Andrew Su: asu@scripps.edu 1. Zhai, Haijun, et al. "Web 2.0-based crowdsourcing for high-quality gold standard development in clinical natural language processing." Journal of medical Internet research 15.4 (2013). 2. Doğan, Rezarta Islamaj, and Zhiyong Lu. "An improved corpus of disease mentions in PubMed citations." Proceedings of the 2012 Workshop on Biomedical Natural Language Processing. Association for Computational Linguistics, 2012. ABSTRACT FUNDING Ongoing experiments Iterate, iterate, iterate Microtask: highlight “diseases” in a PubMed abstract QUESTION: To what extent can we reproduce the expert-created NCBI disease corpus [2] with non-expert annotators ? Highlight all diseases and disease abbreviations “...are associated with Huntington disease ( HD )... HD patients received...” “The Wiskott-Aldrich syndrome ( WAS ) …” Highlight the longest span of text specific to a disease “... contains the insulin-dependent diabetes mellitus locus …” and not just ‘diabetes’. “...was initially detected in four of 33 colorectal cancer families…” Highlight disease conjunctions as single, long spans. “...the life expectancy of Duchenne and Becker muscular dystrophy patients..” “... a significant fraction of familial breast and ovarian cancer , but undergoes…” Highlight symptoms - physical results of having a disease “XFE progeroid syndrome can cause dwarfism, cachexia, and microcephaly. Patients often display learning disabilities, hearing loss, and visual impairment. Highlight all occurrences of disease terms “Women who carry a mutation in the BRCA1 gene have an 80 % risk of breast cancer by the age of 70. Individuals who have rare alleles of the VNTR also have an increased risk of breast cancer ( 2-4 )”. Experiments Key points • Took three design iterations to produce the current system • Even first attempt was better than automated methods • Can now perform task at about 500 abstracts and $200 per week • Only 33/194 workers passed the qualification test • Only 17 workers contributed to the training corpus task • How might we increase the number of workers while maintaining high quality? We are hiring! Looking for postdocs, programmers interested in crowdsourcing and bioinformatics contact asu@scripps.edu Microtask crowdsourcing, key points • Hundreds of thousands of workers working in parallel • Perform ‘human intelligence tasks’ (HITs) • Paid $ for each HIT • Tasks are managed programmatically (1) Take HIT- specific qualification exam (2) Perform HITs (3) Get paid Quality is managed by: • Qualification tests, test HITs with known answers • Redundancy and Aggregation: multiple workers perform each task and we merge their work programmatically Example Disease Mentions Qualification test: identify the correct annotations from three representative abstract snippets. Score ranges from 0-26. Passing is > 21. Key parameters • AMT platform • $.06/abstract • 593 abstracts • 5 workers per abstract • No worker filters beyond qualification test Annotation Interface RESULTS, reproducing the NCBI training corpus Top aggregate performance F = 0.815 Precision = 0.823 Recall = 0.807 K = 2 means that at least 2 workers independently created each annotation Current hypothesis: • Train workers using feedback provided after each task completed based on embedded gold standard annotations and annotations from other workers

×