Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Open biomedical knowledge using crowdsourcing and citizen science

2,110 views

Published on

Talk given at UCSD's Genetics & Genomics / Bioinformatics & Systems Biology joint seminar series on November 5, 2015.

Published in: Science
  • Be the first to comment

Open biomedical knowledge using crowdsourcing and citizen science

  1. 1. Open biomedical knowledge using crowdsourcing and citizen science Andrew Su, Ph.D. @andrewsu asu@scripps.edu http://sulab.org November 5, 2015 UCSD Slides: slideshare.net/andrewsu
  2. 2. 2 Candidate genes FLNB CTNNB1 EPHA3 SMAD3 XPO1 RPS27 FLCN ATR FLT3 BRD2 ERG RAF1 EGFR ERBB4 RARA JAK3 LRP1 WT1 PML SMARCA4 … Candidate variants chr1:g.156084782C>G chr6:g.31911991G>T chr19:g.3767338C>T chr19:g.3783925C>T chr7:g.552021G>A chr3:g.123005609G>T …
  3. 3. 3 Biology is an INFORMATION science Pietro Bellini https://flic.kr/p/k5jmja
  4. 4. Prioritization of human genetic variants 4 1000s of genetic variants < 10 candidate genes Filters - Variant type - Allele frequencies - Previous clinical observation - Predicted functional effects - Gene function - …
  5. 5. Data integration as a cottage industry 5 dbNSFP
  6. 6. Data integration as hardened community software 6 dbNSFP MyVariant.info
  7. 7. MyGene.info for integrating gene annotations 7 Gene MyGene.info
  8. 8. MyGene.info for integrating gene annotations 8 http://mygene.info/metadata Current version history Current stats
  9. 9. MyGene.info for integrating gene annotations 9 399070 210381 120173 22249 7292 3563 1767 1031 616 406 2724 10 20 30 40 50 60 70 80 90 100 More 0 50000 100000 150000 200000 250000 300000 350000 400000 450000 request time (ms) Frequency Gene annotation service (/v2/gene)
  10. 10. MyGene.info for integrating gene annotations 10 2 ~ 3M requests per month
  11. 11. MyGene.info for integrating gene annotations 11
  12. 12. MyGene.info for integrating gene annotations 12 2015 – 2018
  13. 13. Bioinformatician-friendly JSON output, REST API 13 http://MyGene.info/v2/gene/7157 http://MyVariant.info/v1/variant/ chr7:g.55241707G>T
  14. 14. Variant and gene prioritization 14
  15. 15. Variant and gene prioritization 15 2441 2308 1917 18 9 5
  16. 16. Variant and gene prioritization 16 2441 2308 1917 18 9 5 https://github.com/SuLab/myvariant.info/ blob/master/docs/ipynb/myvariant_R_miller.ipynb
  17. 17. Open biomedical knowledge 17 MyVariant.info MyGene.info Integration of molecular biology databases via high performance APIs
  18. 18. Open biomedical knowledge 18 MyVariant.info MyGene.info Integration of molecular biology databases via high performance APIs Biomedical Linked Open Data
  19. 19. The Gene Wiki project 19 Protein structure Symbols and identifiers Tissue expression pattern Gene Ontology annotations Links to structured databases Gene summary Protein interactions Linked references Huss, PLoS Biol, 2008
  20. 20. The Gene Wiki project 20
  21. 21. The Gene Wiki project 21
  22. 22. Wikidata 22 Provide a database of the world’s knowledge that anyone can edit - Denny Vrandečić
  23. 23. Centralizing key data storage 23 Source: http://commons.wikimedia.org/wiki/File:Wikidata_slides_Magnus_Manske,_Cambridge,_2014-02-27.pdf
  24. 24. Centralizing key data storage 24
  25. 25. Centralizing key data storage 25
  26. 26. Loading biological data into Wikidata 26 Entrez Gene Ensembl UniProt UCSC PDB RefSeq
  27. 27. Wikidata for biology 27 is a regulates Interacts with Protein Glycoprotein Neural development VLDL receptor Amyloid precursor protein Property:P31 Property:P128 Property:P129 Q8054 Q187126 Q1345738 Q1979313 Q423510 Q414043 Reelin http://www.wikidata.org/wiki/Q414043
  28. 28. Wikidata for biology 28 Property:P31 Property:P128 Property:P129 Q8054 Q187126 Q1345738 Q1979313 Q423510 Q414043 http://wikidata.org/w/api.php?action=wbgetentities&ids=Q414043&languages=en
  29. 29. 29 ~150k genes and proteins ~2k FDA-approved drugs ~7k human diseases
  30. 30. Centralizing key data storage 30 287 language editions of Wikipedia Bioinformatics community Toxicology community Epidemiology community … …
  31. 31. Open biomedical knowledge 31 MyVariant.info MyGene.info Integration of molecular biology databases via high performance APIs Biomedical Linked Open Data
  32. 32. Open biomedical knowledge 32 Free text to structured data MyVariant.info MyGene.info Integration of molecular biology databases via high performance APIs Biomedical Linked Open Data
  33. 33. The biomedical literature is massive… 33 0 200,000 400,000 600,000 800,000 1,000,000 1,200,000 1983 1988 1993 1998 2003 2008 2013 Number of new PubMed-indexed articles
  34. 34. … but it is very hard to query and compute 34
  35. 35. … but it is very hard to query and compute 35 Imatinib Crizotinib Erlotinib Gefitinib Sorafenib Lapatinib Dasatinib … Acute myeloid leukemia Acute lymphoblastic leukemia Chronic myelogenous leukemia Chronic lymphocytic leukemia Hodgkin lymphoma Non-Hodgkin lymphoma Myeloma … AND
  36. 36. The Network of BioThings 36 1. Identify biomedical concepts in text … We report a case of familial systemic mastocytosis with the rare KIT K509I germ line mutation. In vitro treatment with imatinib, dasatinib and PKC412 reduced cell viability of primary mast cells harboring KIT K509I mutation. Both patients with familial systemic mastocytosis had remarkable hematological and skin improvement after three months of imatinib treatment. Leuk Res. 2014 Oct;38(10):1245-51. doi: 10.1016/j.leukres. GENES DISEASES DRUGS VARIANTS
  37. 37. The Network of BioThings 37 imatinib dasatinib PKC412 Familial systemic mastocytosis KIT K509I 1. Identify biomedical concepts in text 2. Identify relationships between concepts Mutation of Mutation causes causes treats inhibits
  38. 38. 38 Goal: Assemble a network of biomedical knowledge that is comprehensive, current, computable and traceable.
  39. 39. Question: Can Citizen Scientists collectively perform concept recognition in biomedical texts? 39
  40. 40. Simple annotation interface 40 Click to see instructions Highlight disease mentions 15 workers annotate each abstract
  41. 41. 41 Experts versus crowd for concept identification 593 PubMed abstracts 6,900 mentions of “disease concepts” F = 0.87F = 0.78 $$$
  42. 42. 42 Experts versus crowd for concept identification 593 PubMed abstracts 6,900 mentions of “disease concepts” F = 0.87F = 0.87 $$$ • 9 days • 145 workers • Total: $630.96
  43. 43. Does Mechanical Turk scale? 43 1,000,000 articles per year 10 annotators / article 4 tasks / doc $0.066 / task $ 2,640,000 / year
  44. 44. 44 http://mark2cure.org
  45. 45. 45 Paid crowdsourcing • F = 0.84 • 28 days • 212 workers • Total cost: $0 $$$ • F = 0.87 • 9 days • 145 workers • Total: $630.96 “Help science, please” Citizen Science
  46. 46. Does Citizen Science scale? 46 1,000,000 articles * 10 AE / article 15,828 volunteers needed 10,275 AE * 365 days 212 annotators* 28 days AE = Annotation events = Number of annotation events per year Number of annotation events per year per volunteer
  47. 47. Does Citizen Science scale? 47 15,828 volunteers needed 175,000 volunteers 300,000 volunteers 37,000 volunteers 1,000,000 volunteers
  48. 48. Annotating the relationships 48 This molecule inhibits the growth of a broad panel of cancer cell lines, and is particularly efficacious in leukemia cells, including orthotopic leukemia preclinical models as well as in ex vivo acute myeloid leukemia (AML) and chronic lymphocytic leukemia (CLL) patient tumor samples. Thus, inhibition of CDK9 may represent an interesting approach as a cancer therapeutic target especially in hematologic malignancies. therapeutic target subject predicate object GENE DISEASE
  49. 49. 49 Goal: Assemble a network of biomedical knowledge that is comprehensive, current, computable and traceable.
  50. 50. 50 Nina Hale https://flic.kr/p/zoVih
  51. 51. Rare disease case study #1 51 Photo: Retta Beery
  52. 52. 52 Bainbridge et al., STM, 2011
  53. 53. 53 Photo: Retta Beery
  54. 54. Rare disease case study #2 54
  55. 55. 55
  56. 56. 56 … but no obvious treatments
  57. 57. 57 Bainbridge et al., STM, 2011 SPR
  58. 58. What differentiates SPR and NGLY1? 58 SPR
  59. 59. 59 Sarah Olmstead https://flic.kr/p/364dZW NGLY1
  60. 60. 60 NGLY1 (11 PubMed articles) Congenital disorders of glycosylation (822) PNGase (686) ERAD (1330) glycosylation (48,862) alacrima (164) Genetic interactors (3016) symptoms (109,928) 24 million articles in PubMed
  61. 61. Mapping the biomedical network around NGLY1 61 NGLY1
  62. 62. 62
  63. 63. 63 A preliminary view of the NGLY1- focused biological network
  64. 64. Why do I Mark2Cure? 64 I am retired, have a doctorate in medical humanities, and have two children with Gaucher disease. I am just looking for some way to put my education to use. Sounds like a perfect situation for me. My 4 year old daughter Phoebe is living with and battling rare disease. I have Ehlers Danlos Syndrome. I hope to help people learn about this painful and debilitating disorder, so that others like me can receive more effective medical care. Take part in something that helps humanity. I Mark2Cure in memory of my son Mike who had type 1 diabetes. Studied biology in college and I really miss it! In memory of my daughter who had Cystic Fibrosis Give back
  65. 65. Open biomedical knowledge 65 Free text to structured data MyVariant.info MyGene.info Integration of molecular biology databases via high performance APIs Biomedical Linked Open Data
  66. 66. 66 Contact http://sulab.org asu@scripps.edu @andrewsu Gene Wiki / Wikidata Ben Good Sebastian Burgstaller Tim Putman Julia Turner Ginger Tsueng Andra Waagmeester Elvira Mitraka, UMB Lynn Schriml, UMB Justin Leong, UBC Paul Pavlidis, UBC Join the team! http://bit.ly/JoinSuLab Slides: slideshare.net/andrewsu Funding and Support BioGPS: GM83924 Gene Wiki: GM089820 MyGene / MyVariant: HG008473 BD2K COE: GM114833 Icon credits (Noun Project, Wikimedia Commons): Zach VanDeHey, hunotika, Viktorvoigt, Alberto Rojas, Lloyd Humphreys Other Group members Jake Bruggemann Ramya Gamini Karthik Gangavarapu Louis Gioia Toby Li Greg Stupp MyGene / MyVariant Chunlei Wu Cyrus Afrasiabi Kevin Xin Adam Mark Mark2Cure Max Nanis Ginger Tsueng Jennifer Fouquier Ben Good Chunlei Wu All Mark2Curators!

×