ISMB2012: The Gene Wiki: Crowdsourcing human gene annotation

3,387 views

Published on

Note, several slides use animation, so for best display please download and view in Powerpoint.

Published in: Education, Technology
  • Be the first to comment

ISMB2012: The Gene Wiki: Crowdsourcing human gene annotation

  1. 1. The Gene Wiki: Crowdsourcing human gene annotation Andrew Su, Ph.D. The Scripps Research Institute ISMB Special Session: Harnessing community intelligence for bioinformatics #ISMB #SS7 July 17, 2012
  2. 2. 2The Long Tail is a prolific source of content Short Head Content produced Long Tail Contributors (sorted) News : Newspapers Blogs Video: TV/Hollywood YouTube Product reviews: Consumer reports Amazon reviews Food reviews: Food critics Yelp Talent judging: Olympics American Idol Gene annotation: Manual curation Gene Wiki
  3. 3. 3 We can harness theLong Tail of scientiststo directly participate in the gene annotation process.
  4. 4. 4Wikipedia is reasonably accurate
  5. 5. 5Wikipedia has breadth and depth Articles Words (millions) Wikipedia Britannica Online http://en.wikipedia.org/wiki/Wikipedia:Size_comparisons, July 2008
  6. 6. Filtering, extracting, and summarizing PubMedDocuments Concepts
  7. 7. 7Wiki success depends on a positive feedback Gene wiki page utility 1 100 2 200 Number of Number of contributors users
  8. 8. 8 10,000 gene “stubs” within Wikipedia Utility Users Contributors Protein structure Gene summary Symbols and identifiers Gene Ontology annotations Proteininteractions Tissue expression Linked patternreferences Links to structured databasesHuss, PLoS Biol, 2008
  9. 9. 9 Gene Wiki has a critical mass of readers Utility Users Contributors Total: ~4.3 million views / monthHuss, PLoS Biol, 2008; Good, NAR, 2011
  10. 10. 10 Gene Wiki has a critical mass of editors Utility ~10,000 words added / month Users Contributors Total 1.42 million words ≈ 230 full-length articles 4.3 million views / month Cumulative edits Productive edits 1000 edits / month VandalismGood, NAR, 2011
  11. 11. 11A review article for every gene is powerful Reelin: 98 editors, 703 edits since July 2002 Hyperlinks to related concepts Heparin: 358 editors, 654 edits since June 2003 AMPK: 109 editors, 203 edits since March 2004 RNAi: 394 editors, 994 edits since October 2002 References to the literature
  12. 12. 12Making the Gene Wiki more computableFree text Structured annotations
  13. 13. 13Filling the gaps in gene annotation Good, BMC Genomics 2011, 12:603 NCBI Entrez Gene: 3362 Gene Wiki mapping Wikilink Candidate assertion GO:0004993 GO exact synonym Annotator
  14. 14. 14Filling the gaps in gene annotation Good, BMC Genomics 2011, 12:603 NCBI Entrez Gene: 334 Gene Wiki mapping Wikilink Candidate assertion GO:0006897 GO exact match Annotator
  15. 15. 15Novel GO annotations – so what? Good, BMC Genomics 2011, 12:603 6319 11,022 ~100,000 “novel” 4703 (43%)annotations annotations annotations match knownmined from from GO @ 48-64% annotations Gene Wiki consortium specificity
  16. 16. 16Gene Wiki content improves enrichment analysis axon Enrichment guidance GO term analysis(GO:0007411) 811 articles 264 genes PubMed Concept Gene list abstracts recognition GO:0007411 Yes NoLinked genes Yes 13 2 through No 251 12033 PubMed P = 1.55 E-20
  17. 17. 17Gene Wiki content improves enrichment analysis muscle Enrichment contraction GO term analysis(GO:0006936) 251 articles 87 genes PubMed Concept Gene list abstracts recognition + Gene Wiki 87 articles GO:0006936 GO:0006936Linked genes Linked genes through through PubMed PubMed + Gene Wiki P = 1.0 P = 1.22 E-09
  18. 18. 18Gene Wiki content improves enrichment analysis More p-value significant with(PubMed + GW) PubMed only Muscle contraction More significant with PubMed + GW p-value (PubMed only)
  19. 19. 19Gene Wiki+ for integrative queries mwsync http://genewikiplus.org
  20. 20. 20Dynamic queries across genes, diseases, SNPs
  21. 21. 21
  22. 22. 22TOP 100GENES
  23. 23. 23Gene Wiki+ for integrative queries mwsync OMIM PharmGKB {{#ask: [[Category:Human_proteins]] [[is_associated_with:: <q>[[Category:Breast_cancer] ]</q>]] [[HasSNP:: … <q>[[is_associated_with:: http://genewikiplus.org
  24. 24. 24Gene Wiki+ for integrative queries mwsync OMIM PharmGKB http://genewikiplus.org
  25. 25. 25 The Long Tail of scientistsis a valuable source of information on gene function
  26. 26. 26Crowdsourcing a gene annotation portal
  27. 27. 27 Collaborators Group membersDoug Howe, ZFIN Erik Clarke Ian MacleodJohn Hogenesch, U PennJon Huss, GNF Ben Good Max NanisLuca de Alfaro, UCSC Salvatore Loguercio Chunlei WuAngel Pizzaro, U PennFaramarz Valafar, SDSUPierre Lindenbaum, Fondation Jean Dausset ISMB travel supportMichael Martone, RushKonrad Koehler, Karo BioWarren Kibbe, Simon Lim, NorthwesternMany Wikipedia editors WP:MCB Project Contact http://sulab.org asu@scripps.edu @andrewsu +Andrew Su Funding and Support (BioGPS: GM83924, Gene Wiki: GM089820)

×