Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
1
Gene-specific review
article for every
human gene
Data integration for
genes, drugs,
diseases
Robust classifiers of brea...
Mark2Cure – biocuration by microtasking
• Challenge: The biomedical literature is
massive and growing exponentially, but i...
Mark2Cure – biocuration by microtasking
• Our approach: Use Amazon Mechanical Turk
platform for paid microtask crowdsourci...
Mark2Cure – biocuration by citizen science
• Our approach: Use volunteer-based citizen
science for microtask crowdsourcing...
Collaborative knowledge management
• Challenge: Biomedical research allows for
genome-scale profiling, but few genes are
p...
Collaborative knowledge management
• Our approach: Create
a gene-specific review
article for every human
gene that is
coll...
Collaborative knowledge management
• Our approach: Create
a gene-specific Wikidata
database entry for every
human gene tha...
Bioinformatics algorithm optimization
• Challenge: Antibody sequence clustering is
computationally expensive (CPU and memo...
Bioinformatics algorithm optimization
• Our approach: Ran TopCoder contest for 10
days, offering $7500 in prize money
• Re...
10
Cyrus Afrasiabi
Ramya Gamini
Louis Gioia
Salvatore Loguercio
Adam Mark
Erick Scott
Greg Stupp
Kevin Xin
Other group mem...
Game for breast cancer prognosis
• Challenge: Genomic classifiers of disease are
difficult to train in a way that consiste...
Game for breast cancer prognosis
• Our approach: Enlist a crowd of expert game
players with diverse perspectives to identi...
Upcoming SlideShare
Loading in …5
×

Panel on Citizen Science and Crowdsourcing Games - March 27, 2015

631 views

Published on

Federal Community of Practice for Crowdsourcing and Citizen Science meeting on Games

Published in: Science
  • Be the first to comment

  • Be the first to like this

Panel on Citizen Science and Crowdsourcing Games - March 27, 2015

  1. 1. 1 Gene-specific review article for every human gene Data integration for genes, drugs, diseases Robust classifiers of breast cancer prognosis Annotation of biomedical literature Expert-guided classifier design Gene-centric web portal Bioinformatics algorithm optimization Andrew Su, Ph.D. @andrewsu asu@scripps.edu http://sulab.org Slides: slideshare.net/andrewsu
  2. 2. Mark2Cure – biocuration by microtasking • Challenge: The biomedical literature is massive and growing exponentially, but it is largely inaccessible • Opportunity: Better access to existing knowledge can make scientific process more efficient and productive • Current situation – Manual biocuration by experts – Natural language processing 2
  3. 3. Mark2Cure – biocuration by microtasking • Our approach: Use Amazon Mechanical Turk platform for paid microtask crowdsourcing • Results: reproduced an expert-generated gold standard at equivalent accuracy, shorter time, fraction of cost 3 K = 6 F score = 0.87 Precision Recall • 593 documents • 9 days • 145 workers • $0.06 / task • Total cost: $630.96
  4. 4. Mark2Cure – biocuration by citizen science • Our approach: Use volunteer-based citizen science for microtask crowdsourcing • Results: reproduced an expert-generated gold standard at equivalent accuracy, shorter time, at no cost 4 • 593 documents • 28 days • 212 workers • Total cost: $0.00 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 k = 6 F score = 0.84 PrecisionRecall Voting threshold http://mark2cure.org
  5. 5. Collaborative knowledge management • Challenge: Biomedical research allows for genome-scale profiling, but few genes are previously known to researcher • Opportunity: Better access to existing knowledge can make scientific process more efficient and productive • Current situation – Review articles (but sparse coverage) – Lots of reading of primary literature 5
  6. 6. Collaborative knowledge management • Our approach: Create a gene-specific review article for every human gene that is collaboratively written, continuously updated, and community reviewed • Results: 5M page views and >1000 edits per month 6
  7. 7. Collaborative knowledge management • Our approach: Create a gene-specific Wikidata database entry for every human gene that is collaboratively integrated, continuously updated, and community reviewed • Results: all human genes and diseases loaded in Wikidata, soon to have drugs and relationships 7
  8. 8. Bioinformatics algorithm optimization • Challenge: Antibody sequence clustering is computationally expensive (CPU and memory) • Opportunity: Large-scale clustering of antibody sequences can aid vaccine development • Current situation: Research-grade code can cluster ~100k sequences in 1.7 hours on high memory (150 GB) machine. 8
  9. 9. Bioinformatics algorithm optimization • Our approach: Ran TopCoder contest for 10 days, offering $7500 in prize money • Results: Best solution can cluster 2.3M sequences in 30 seconds on a typical desktop computer (1.1 GB) 9 log(# sequences processed) log(executiontime) Benchmarks
  10. 10. 10 Cyrus Afrasiabi Ramya Gamini Louis Gioia Salvatore Loguercio Adam Mark Erick Scott Greg Stupp Kevin Xin Other group members Contact http://sulab.org asu@scripps.edu @andrewsu +Andrew Su Mark2Cure Ben Good Max Nanis Ginger Tsueng Chunlei Wu All Mark2Curators! Funding and Support BioGPS: GM83924 Gene Wiki: GM089820 BD2K Center of Excellence: GM114833 Gene Wiki Ben Good Sebastian Burgstaller Andra Waagmeester Elvira Mitraka, UMB Lynn Schriml, UMB Paul Pavlidis, UBC Gang Fu, NCBI Contests Chunlei Wu Ben Good Brian Briney, TSRI Dennis Burton, TSRI Rinat Sergeev, HBS Jin Paik, HBS Karim Laklani, HBS Jingbo Shang Rashid Sial, Appirio Join the team! bit.ly/sulabawesome
  11. 11. Game for breast cancer prognosis • Challenge: Genomic classifiers of disease are difficult to train in a way that consistently validates on secondary datasets • Opportunity: Better classifiers of disease diagnosis and/or prognosis have many clinical applications • Current situation: Most attempts to train classifiers rely on machine learning methods that utilize little or no biological knowledge 11
  12. 12. Game for breast cancer prognosis • Our approach: Enlist a crowd of expert game players with diverse perspectives to identify most biologically relevant genes • Results: Gene sets derived from game player data showed comparable performance to expert-generated gene sets 12 • 1077 registered players • 15,669 games played • Demographics – 59% male, 41% female – 21-29 is most frequent age group – 35% had graduate degree, 32% were biologists

×