Advertisement

Using Citizen Science to organize biomedical knowledge

Andrew Su
Professor at The Scripps Research Institute
Mar. 5, 2015
Advertisement

More Related Content

Similar to Using Citizen Science to organize biomedical knowledge(20)

Advertisement

More from Andrew Su(20)

Advertisement

Using Citizen Science to organize biomedical knowledge

  1. Using Citizen Science to organize biomedical knowledge Andrew Su, Ph.D. @andrewsu asu@scripps.edu http://sulab.org March 5, 2015 Future of Genomic Medicine Slides posted at slideshare.net/andrewsu
  2. 2 Candidate genes FLNB CTNNB1 EPHA3 SMAD3 XPO1 RPS27 FLCN ATR FLT3 BRD2 ERG RAF1 EGFR ERBB4 RARA JAK3 LRP1 WT1 PML SMARCA4 …
  3. The biomedical literature is growing fast… 3 0 200,000 400,000 600,000 800,000 1,000,000 1,200,000 1983 1988 1993 1998 2003 2008 2013 Number of new PubMed-indexed articles
  4. … but it is very hard to query and compute 4
  5. … but it is very hard to query and compute 5 Imatinib Crizotinib Erlotinib Gefitinib Sorafenib Lapatinib Dasatinib … Acute myeloid leukemia Acute lymphoblastic leukemia Chronic myelogenous leukemia Chronic lymphocytic leukemia Hodgkin lymphoma Non-Hodgkin lymphoma Myeloma … AND
  6. 6 Pathways Diseases Proteins Variants Genes Drugs Goal: Assemble a network of biomedical knowledge that is comprehensive, current, computable and traceable.
  7. Information Extraction 7 1. Identify high level concepts in text 2. Identify relationships between concepts
  8. 8 Doğan and Lu. Proceedings of the 2012 Workshop on BioNLP, 2012, 91-9. NCBI Disease Corpus 593 PubMed abstracts 12 expert annotators (2 per document) 6,900 “disease concept” mentions
  9. Question: Can a group of non-scientists collectively perform concept recognition in biomedical texts? 9
  10. Amazon Mechanical Turk (AMT) 10 Requester Amazon Workers 1. Create tasks 2. Execute 3. Aggregate
  11. Experimental design Task: Identify the “disease concepts” in the 593 abstracts from the NCBI disease corpus – $0.06 per Human Intelligence Task (HIT) – HIT = annotate one abstract from PubMed – 15 workers annotate each abstract 11
  12. Comparison to gold standard 12 K = 6 F score = 0.87 • 593 documents • 15 users / doc • 9 days • 145 workers • $630.96 Precision Recall
  13. Comparisons to text-mining algorithms 13 Fscore Text-mining AMT experiments
  14. Comparisons to human annotators 14 Average level of agreement between expert annotators (stage 1) F = 0.76
  15. Comparisons to human annotators 15 F = 0.76 F = 0.87 Average level of agreement between expert annotators (stage 2)
  16. Does Mechanical Turk scale? 16 1,000,000 articles per year 10 annotators / article 4 tasks / doc $0.06 / task $ 2,400,000 / year
  17. Question: Can a group of non-scientists collectively perform concept recognition in biomedical texts ? 17 and will they do it for free? ^
  18. 18 http://mark2cure.org
  19. Mark2Cure Campaign #0 • Goal: replicate the NCBI disease corpus – 593 documents, 15x redundancy • Launched Jan 19, 2015 • Completed Feb 16, 2015 19 – 4 weeks – 10,275 document annotation events – 212 unique users
  20. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Comparison to gold standard 20 k = 6 F score = 0.84 PrecisionRecall Voting threshold Total cost: $0
  21. Does Citizen Science scale? 21 1,000,000 articles * 10 AE / article 15,828 volunteers needed 10,275 AE * 365 days 212 annotators* 28 days AE = Annotation events = Number of annotation events per year Number of annotation events per year per volunteer
  22. Does Citizen Science scale? 22 15,828 volunteers needed 175,000 volunteers 300,000 volunteers 37,000 volunteers 1,000,000 volunteers
  23. Annotating the relationships 23 This molecule inhibits the growth of a broad panel of cancer cell lines, and is particularly efficacious in leukemia cells, including orthotopic leukemia preclinical models as well as in ex vivo acute myeloid leukemia (AML) and chronic lymphocytic leukemia (CLL) patient tumor samples. Thus, inhibition of CDK9 may represent an interesting approach as a cancer therapeutic target especially in hematologic malignancies. therapeutic target subject predicate object GENE DISEASE
  24. 24 Candidate genes FLNB CTNNB1 EPHA3 SMAD3 XPO1 RPS27 FLCN ATR FLT3 BRD2 ERG RAF1 EGFR ERBB4 RARA JAK3 LRP1 WT1 PML SMARCA4 …
  25. 25 Cyrus Afrasiabi Sebastian Burgstaller Ramya Gamini Louis Gioia Salvatore Loguercio Adam Mark Erick Scott Greg Stupp Andra Waagmeester Kevin Xin Other group members Contact http://sulab.org asu@scripps.edu @andrewsu +Andrew Su Mark2Cure Ben Good Max Nanis Ginger Tsueng Chunlei Wu All Mark2Curators! Funding and Support BioGPS: GM83924 Gene Wiki: GM089820 BD2K Center of Excellence: GM114833 Icon credits (Noun Project, Wikimedia Commons): Zach VanDeHey, hunotika, Viktorvoigt, Alberto Rojas, Lloyd Humphreys Matt and Cristina Might NGLY1 community
  26. Why do I Mark2Cure? 26 I am retired, have a doctorate in medical humanities, and have two children with Gaucher disease. I am just looking for some way to put my education to use. My 4 year old daughter Phoebe is living with and battling rare disease. I have Ehlers Danlos Syndrome. I hope to help people learn about this painful and debilitating disorder, so that others like me can receive more effective medical care. Take part in something that helps humanity. I Mark2Cure in memory of my son Mike who had type 1 diabetes. Studied biology in college and I really miss it! In memory of my daughter who had Cystic Fibrosis To give back
Advertisement