The Gene Wiki: Crowdsourcing human gene               annotation                    Andrew Su, Ph.D.      Department of Mo...
2The Long Tail is a prolific source of content                      Short                      Head            Content    ...
3  We can harness theLong Tail of scientiststo directly participate in  the gene annotation        process.
4Wikipedia is reasonably accurate
5Wikipedia has breadth and depth           Articles            Words            (millions)                         Wikiped...
Filtering, extracting, and summarizing PubMedDocuments Concepts
7Wiki success depends on a positive feedback                  Gene wiki page utility                             1   100  ...
8 10,000 gene “stubs” within Wikipedia          Utility                                                         Users     ...
9 Gene Wiki has a critical mass of readers                                                                      Utility   ...
10 Gene Wiki has a critical mass of editors                                                                               ...
11A review article for every gene is powerful      Reelin: 68 editors, 543 edits since July 2002      Heparin: 175 editors...
12Making the Gene Wiki more computableFree text       Structured annotations
13Filling the gaps in gene annotation                                   NCBI Entrez Gene: 3362                       Gene ...
14Filling the gaps in gene annotation                                   NCBI Entrez Gene: 334                       Gene W...
Disease associations mined from the Gene Wiki                                        Good, BMC Genomics 2011, 12:603  Gene...
Disease associations mined from the Gene Wiki                                             Good, BMC Genomics 2011, 12:603 ...
GO associations mined from the Gene Wiki                                        Good, BMC Genomics 2011, 12:603  Gene Wiki...
GO associations mined from the Gene Wiki                                              Good, BMC Genomics 2011, 12:603     ...
19Common sources of error in GO associations                                                        Good, BMC Genomics 201...
20Common sources of error in GO associations                                         Good, BMC Genomics 2011, 12:603    2)...
21Novel GO annotations – so what?                 6319  11,022                                 ~100,000                “no...
22Gene Wiki content improves enrichment analysis    axon                                            Enrichment  guidance  ...
23Gene Wiki content improves enrichment analysis   muscle                                          Enrichment contraction ...
24Gene Wiki content improves enrichment analysis                     More    p-value       significant(PubMed + GW)    Pub...
25Challenges and future directions   • How to complement and integrate with     traditional biocuration workflows?   • How...
26          The Long Tail of scientistsis a valuable source of  information on gene        function
27       Collaborators                                                  Group membersDoug Howe, ZFIN                      ...
28Making the Gene Wiki more reliable  Novartis is a multinational   2       The company name is derived  pharmaceutical co...
29Making the Gene Wiki more reliable  Novartis is a multinational         2         The company name is derived  pharmaceu...
Upcoming SlideShare
Loading in …5
×

ISB2012: The Gene Wiki: Crowdsourcing human gene annotation

895 views
857 views

Published on

some animations don't adapt well to static slides -- download the ppt file to view...

Published in: Health & Medicine, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
895
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
9
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Relying on the entire community of scientists to digest the biomedical literature: identification filtering extraction summarization
  • Structured annotations enable pathway analysis, statistical analyses, cross-species comparisons
  • Transduction accounts for 70% of the concept recognition problems
  • Tried on 773 GO categories, significant in 356 cases (46%)
  • We extended this analysis to all 773 GO terms used in human gene annotations and found a consistent improvement in the enrichment scores
  • We started working with Doug Howe because he helped us learn a lot about biocuration, but clearly we’d need to expand partnersIn particular, since GO curation seems to be largely drawn by organisms
  • Also want to convince you that the Long Tail of bioinformatics developers is valuable too, but first have to convince you that there is a bottleneck in tool development.
  • Reverted four minutes later
  • Reverted four minutes later
  • ISB2012: The Gene Wiki: Crowdsourcing human gene annotation

    1. 1. The Gene Wiki: Crowdsourcing human gene annotation Andrew Su, Ph.D. Department of Molecular and Experimental Medicine The Scripps Research Institute Biocuration 2012 April 2, 2012
    2. 2. 2The Long Tail is a prolific source of content Short Head Content produced Long Tail Contributors (sorted) News : Newspapers Blogs Video: TV/Hollywood YouTube Product reviews: Consumer reports Amazon reviews Food reviews: Food critics Yelp Talent judging: Olympics American Idol Gene annotation: Manual curation Gene Wiki
    3. 3. 3 We can harness theLong Tail of scientiststo directly participate in the gene annotation process.
    4. 4. 4Wikipedia is reasonably accurate
    5. 5. 5Wikipedia has breadth and depth Articles Words (millions) Wikipedia Britannica Online http://en.wikipedia.org/wiki/Wikipedia:Size_comparisons, July 2008
    6. 6. Filtering, extracting, and summarizing PubMedDocuments Concepts
    7. 7. 7Wiki success depends on a positive feedback Gene wiki page utility 1 100 2 200 Number of Number of contributors users
    8. 8. 8 10,000 gene “stubs” within Wikipedia Utility Users Contributors Protein structure Gene summary Symbols and identifiers Gene Ontology annotations Proteininteractions Tissue expression Linked patternreferences Links to structured databasesHuss, PLoS Biol, 2008
    9. 9. 9 Gene Wiki has a critical mass of readers Utility Users Contributors Total: ~4.3 million views / monthHuss, PLoS Biol, 2008; Good, NAR, 2011
    10. 10. 10 Gene Wiki has a critical mass of editors Utility ~10,000 words added / month Users Contributors Total 1.42 million words ≈ 230 full-length articles 4.3 million views / month Cumulative edits Productive edits 1000 edits / month VandalismGood, NAR, 2011
    11. 11. 11A review article for every gene is powerful Reelin: 68 editors, 543 edits since July 2002 Heparin: 175 editors, 320 edits since June 2003 AMPK: 44 editors, 84 edits since March 2004 RNAi: 232 editors, 708 edits since October 2002 References to the literature Hyperlinks to related concepts
    12. 12. 12Making the Gene Wiki more computableFree text Structured annotations
    13. 13. 13Filling the gaps in gene annotation NCBI Entrez Gene: 3362 Gene Wiki mapping Wikilink Candidate assertion GO:0004993 GO exact synonym
    14. 14. 14Filling the gaps in gene annotation NCBI Entrez Gene: 334 Gene Wiki mapping Wikilink Candidate assertion GO:0006897 GO exact match
    15. 15. Disease associations mined from the Gene Wiki Good, BMC Genomics 2011, 12:603 Gene Wiki Articles (10,271) 23% exact match Filter out 5% match seeded text parent 2% match child 70% have NCBO no match Annotator Matched Disease 2147 Compare to Ontology terms candidate DO database (2983) annotations
    16. 16. Disease associations mined from the Gene Wiki Good, BMC Genomics 2011, 12:603 Expert curation Correct Incorrect: 10% 86% Maybe: 4% Overall specificity: 90-93%
    17. 17. GO associations mined from the Gene Wiki Good, BMC Genomics 2011, 12:603 Gene Wiki Articles (10,271) 17% exact match Filter out seeded text 26% match parent 55% have NCBO no match Annotator 2% match child Matched Gene 6319 Compare to Ontology terms candidate GO database (11,022) annotations
    18. 18. GO associations mined from the Gene Wiki Good, BMC Genomics 2011, 12:603 Expert curation Correct 14% Maybe 60% 26% Incorrect Overall specificity: 48-64%
    19. 19. 19Common sources of error in GO associations Good, BMC Genomics 2011, 12:603 1) Incorrect concept recognition OR2F1: “Olfactory receptors … are responsible for the recognition and G protein- mediated transduction of odorant signals.” Signal transduction (GO:0007165) Transduction (GO:0009293) The cellular process in which a signal The transfer of genetic information to a is conveyed to trigger a change in the bacterium from a bacteriophage or activity or state of a cell. Signal between bacterial or yeast cells transduction begins with reception of a mediated by a phage vector. signal, e.g. a ligand binding to a receptor or receptor activation by a stimulus such as light, and ends with regulation of a downstream cellular process…
    20. 20. 20Common sources of error in GO associations Good, BMC Genomics 2011, 12:603 2) Incorrect sentence context MEF2C: “Several post translational modifications have been identified including phosphorylation on serine-59 …” Dephosphorylation Excretion Phosporylation Gene expression Glycosylation Localization MEF2C Neurogenesis Methylation Proteolysis Secretion Transport Myelination Transcription Translation
    21. 21. 21Novel GO annotations – so what? 6319 11,022 ~100,000 “novel” 4703 (43%)annotations annotations annotations match knownmined from from GO @ 48-64% annotations Gene Wiki consortium specificity
    22. 22. 22Gene Wiki content improves enrichment analysis axon Enrichment guidance GO term analysis(GO:0007411) 811 articles 264 genes PubMed Concept Gene list abstracts recognition GO:0007411 Yes NoLinked genes Yes 13 2 through No 251 12033 PubMed P = 1.55 E-20
    23. 23. 23Gene Wiki content improves enrichment analysis muscle Enrichment contraction GO term analysis(GO:0006936) 251 articles 87 genes PubMed Concept Gene list abstracts recognition + Gene Wiki 87 articles GO:0006936 GO:0006936Linked genes Linked genes through through PubMed PubMed + Gene Wiki P = 1.0 P = 1.22 E-09
    24. 24. 24Gene Wiki content improves enrichment analysis More p-value significant(PubMed + GW) PubMed only Muscle contraction More significant PubMed + GW p-value (PubMed only)
    25. 25. 25Challenges and future directions • How to complement and integrate with traditional biocuration workflows? • How to disseminate and utilize crowdsourced annotations?
    26. 26. 26 The Long Tail of scientistsis a valuable source of information on gene function
    27. 27. 27 Collaborators Group membersDoug Howe, ZFIN Erik Clarke Ian MacleodJohn Hogenesch, U PennJon Huss, GNF Ben Good (*) Chunlei WuLuca de Alfaro, UCSC Salvatore LoguercioAngel Pizzaro, U PennFaramarz Valafar, SDSUPierre Lindenbaum, Fondation Jean DaussetMichael Martone, Rush See poster # 30 for more onKonrad Koehler, Karo BioWarren Kibbe, Simon Lim, Northwestern the Gene Wiki andMany Wikipedia editors crowdsourcing in biology! WP:MCB Project Contact http://sulab.org asu@scripps.edu @andrewsu +Andrew Su Funding and Support (BioGPS: GM83924, Gene Wiki: GM089820)
    28. 28. 28Making the Gene Wiki more reliable Novartis is a multinational 2 The company name is derived pharmaceutical company from old Greek, and means based in Basel, Switzerland "destroyer of birds".that manufactures drugs such as clozapine (Clozaril), diclofenac (Voltaren), … 2
    29. 29. 29Making the Gene Wiki more reliable Novartis is a multinational 2 The company name is derived pharmaceutical company from old Greek, and means based in Basel, Switzerland "destroyer of birds".that manufactures drugs such as clozapine (Clozaril), diclofenac (Voltaren), … 36211 total edits 36 total edits * * * * * * * * * * * * * * High-trust author Low-trust author http://www.wikitrust.net/

    ×