ISB2012: The Gene Wiki: Crowdsourcing human gene annotation
Upcoming SlideShare
Loading in...5
×
 

ISB2012: The Gene Wiki: Crowdsourcing human gene annotation

on

  • 913 views

some animations don't adapt well to static slides -- download the ppt file to view...

some animations don't adapt well to static slides -- download the ppt file to view...

Statistics

Views

Total Views
913
Views on SlideShare
908
Embed Views
5

Actions

Likes
1
Downloads
4
Comments
0

3 Embeds 5

http://www.linkedin.com 3
https://www.linkedin.com 1
https://twitter.com 1

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-ShareAlike LicenseCC Attribution-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Relying on the entire community of scientists to digest the biomedical literature: identification filtering extraction summarization
  • Structured annotations enable pathway analysis, statistical analyses, cross-species comparisons
  • Transduction accounts for 70% of the concept recognition problems
  • Tried on 773 GO categories, significant in 356 cases (46%)
  • We extended this analysis to all 773 GO terms used in human gene annotations and found a consistent improvement in the enrichment scores
  • We started working with Doug Howe because he helped us learn a lot about biocuration, but clearly we’d need to expand partnersIn particular, since GO curation seems to be largely drawn by organisms
  • Also want to convince you that the Long Tail of bioinformatics developers is valuable too, but first have to convince you that there is a bottleneck in tool development.
  • Reverted four minutes later
  • Reverted four minutes later

ISB2012: The Gene Wiki: Crowdsourcing human gene annotation ISB2012: The Gene Wiki: Crowdsourcing human gene annotation Presentation Transcript

  • The Gene Wiki: Crowdsourcing human gene annotation Andrew Su, Ph.D. Department of Molecular and Experimental Medicine The Scripps Research Institute Biocuration 2012 April 2, 2012
  • 2The Long Tail is a prolific source of content Short Head Content produced Long Tail Contributors (sorted) News : Newspapers Blogs Video: TV/Hollywood YouTube Product reviews: Consumer reports Amazon reviews Food reviews: Food critics Yelp Talent judging: Olympics American Idol Gene annotation: Manual curation Gene Wiki
  • 3 We can harness theLong Tail of scientiststo directly participate in the gene annotation process.
  • 4Wikipedia is reasonably accurate
  • 5Wikipedia has breadth and depth Articles Words (millions) Wikipedia Britannica Online http://en.wikipedia.org/wiki/Wikipedia:Size_comparisons, July 2008
  • Filtering, extracting, and summarizing PubMedDocuments Concepts
  • 7Wiki success depends on a positive feedback Gene wiki page utility 1 100 2 200 Number of Number of contributors users
  • 8 10,000 gene “stubs” within Wikipedia Utility Users Contributors Protein structure Gene summary Symbols and identifiers Gene Ontology annotations Proteininteractions Tissue expression Linked patternreferences Links to structured databasesHuss, PLoS Biol, 2008
  • 9 Gene Wiki has a critical mass of readers Utility Users Contributors Total: ~4.3 million views / monthHuss, PLoS Biol, 2008; Good, NAR, 2011
  • 10 Gene Wiki has a critical mass of editors Utility ~10,000 words added / month Users Contributors Total 1.42 million words ≈ 230 full-length articles 4.3 million views / month Cumulative edits Productive edits 1000 edits / month VandalismGood, NAR, 2011
  • 11A review article for every gene is powerful Reelin: 68 editors, 543 edits since July 2002 Heparin: 175 editors, 320 edits since June 2003 AMPK: 44 editors, 84 edits since March 2004 RNAi: 232 editors, 708 edits since October 2002 References to the literature Hyperlinks to related concepts
  • 12Making the Gene Wiki more computableFree text Structured annotations
  • 13Filling the gaps in gene annotation NCBI Entrez Gene: 3362 Gene Wiki mapping Wikilink Candidate assertion GO:0004993 GO exact synonym
  • 14Filling the gaps in gene annotation NCBI Entrez Gene: 334 Gene Wiki mapping Wikilink Candidate assertion GO:0006897 GO exact match
  • Disease associations mined from the Gene Wiki Good, BMC Genomics 2011, 12:603 Gene Wiki Articles (10,271) 23% exact match Filter out 5% match seeded text parent 2% match child 70% have NCBO no match Annotator Matched Disease 2147 Compare to Ontology terms candidate DO database (2983) annotations
  • Disease associations mined from the Gene Wiki Good, BMC Genomics 2011, 12:603 Expert curation Correct Incorrect: 10% 86% Maybe: 4% Overall specificity: 90-93%
  • GO associations mined from the Gene Wiki Good, BMC Genomics 2011, 12:603 Gene Wiki Articles (10,271) 17% exact match Filter out seeded text 26% match parent 55% have NCBO no match Annotator 2% match child Matched Gene 6319 Compare to Ontology terms candidate GO database (11,022) annotations
  • GO associations mined from the Gene Wiki Good, BMC Genomics 2011, 12:603 Expert curation Correct 14% Maybe 60% 26% Incorrect Overall specificity: 48-64%
  • 19Common sources of error in GO associations Good, BMC Genomics 2011, 12:603 1) Incorrect concept recognition OR2F1: “Olfactory receptors … are responsible for the recognition and G protein- mediated transduction of odorant signals.” Signal transduction (GO:0007165) Transduction (GO:0009293) The cellular process in which a signal The transfer of genetic information to a is conveyed to trigger a change in the bacterium from a bacteriophage or activity or state of a cell. Signal between bacterial or yeast cells transduction begins with reception of a mediated by a phage vector. signal, e.g. a ligand binding to a receptor or receptor activation by a stimulus such as light, and ends with regulation of a downstream cellular process…
  • 20Common sources of error in GO associations Good, BMC Genomics 2011, 12:603 2) Incorrect sentence context MEF2C: “Several post translational modifications have been identified including phosphorylation on serine-59 …” Dephosphorylation Excretion Phosporylation Gene expression Glycosylation Localization MEF2C Neurogenesis Methylation Proteolysis Secretion Transport Myelination Transcription Translation
  • 21Novel GO annotations – so what? 6319 11,022 ~100,000 “novel” 4703 (43%)annotations annotations annotations match knownmined from from GO @ 48-64% annotations Gene Wiki consortium specificity
  • 22Gene Wiki content improves enrichment analysis axon Enrichment guidance GO term analysis(GO:0007411) 811 articles 264 genes PubMed Concept Gene list abstracts recognition GO:0007411 Yes NoLinked genes Yes 13 2 through No 251 12033 PubMed P = 1.55 E-20
  • 23Gene Wiki content improves enrichment analysis muscle Enrichment contraction GO term analysis(GO:0006936) 251 articles 87 genes PubMed Concept Gene list abstracts recognition + Gene Wiki 87 articles GO:0006936 GO:0006936Linked genes Linked genes through through PubMed PubMed + Gene Wiki P = 1.0 P = 1.22 E-09
  • 24Gene Wiki content improves enrichment analysis More p-value significant(PubMed + GW) PubMed only Muscle contraction More significant PubMed + GW p-value (PubMed only)
  • 25Challenges and future directions • How to complement and integrate with traditional biocuration workflows? • How to disseminate and utilize crowdsourced annotations?
  • 26 The Long Tail of scientistsis a valuable source of information on gene function
  • 27 Collaborators Group membersDoug Howe, ZFIN Erik Clarke Ian MacleodJohn Hogenesch, U PennJon Huss, GNF Ben Good (*) Chunlei WuLuca de Alfaro, UCSC Salvatore LoguercioAngel Pizzaro, U PennFaramarz Valafar, SDSUPierre Lindenbaum, Fondation Jean DaussetMichael Martone, Rush See poster # 30 for more onKonrad Koehler, Karo BioWarren Kibbe, Simon Lim, Northwestern the Gene Wiki andMany Wikipedia editors crowdsourcing in biology! WP:MCB Project Contact http://sulab.org asu@scripps.edu @andrewsu +Andrew Su Funding and Support (BioGPS: GM83924, Gene Wiki: GM089820)
  • 28Making the Gene Wiki more reliable Novartis is a multinational 2 The company name is derived pharmaceutical company from old Greek, and means based in Basel, Switzerland "destroyer of birds".that manufactures drugs such as clozapine (Clozaril), diclofenac (Voltaren), … 2
  • 29Making the Gene Wiki more reliable Novartis is a multinational 2 The company name is derived pharmaceutical company from old Greek, and means based in Basel, Switzerland "destroyer of birds".that manufactures drugs such as clozapine (Clozaril), diclofenac (Voltaren), … 36211 total edits 36 total edits * * * * * * * * * * * * * * High-trust author Low-trust author http://www.wikitrust.net/