Tried on 773 GO categories, significant in 356 cases (46%)
We extended this analysis to all 773 GO terms used in human gene annotations and found a consistent improvement in the enrichment scores
Also want to convince you that the Long Tail of bioinformatics developers is valuable too, but first have to convince you that there is a bottleneck in tool development.
Transcript of "ISMB2012: The Gene Wiki: Crowdsourcing human gene annotation"
The Gene Wiki: Crowdsourcing human gene annotation Andrew Su, Ph.D. The Scripps Research Institute ISMB Special Session: Harnessing community intelligence for bioinformatics #ISMB #SS7 July 17, 2012
2The Long Tail is a prolific source of content Short Head Content produced Long Tail Contributors (sorted) News : Newspapers Blogs Video: TV/Hollywood YouTube Product reviews: Consumer reports Amazon reviews Food reviews: Food critics Yelp Talent judging: Olympics American Idol Gene annotation: Manual curation Gene Wiki
3 We can harness theLong Tail of scientiststo directly participate in the gene annotation process.
5Wikipedia has breadth and depth Articles Words (millions) Wikipedia Britannica Online http://en.wikipedia.org/wiki/Wikipedia:Size_comparisons, July 2008
Filtering, extracting, and summarizing PubMedDocuments Concepts
7Wiki success depends on a positive feedback Gene wiki page utility 1 100 2 200 Number of Number of contributors users
8 10,000 gene “stubs” within Wikipedia Utility Users Contributors Protein structure Gene summary Symbols and identifiers Gene Ontology annotations Proteininteractions Tissue expression Linked patternreferences Links to structured databasesHuss, PLoS Biol, 2008
9 Gene Wiki has a critical mass of readers Utility Users Contributors Total: ~4.3 million views / monthHuss, PLoS Biol, 2008; Good, NAR, 2011
10 Gene Wiki has a critical mass of editors Utility ~10,000 words added / month Users Contributors Total 1.42 million words ≈ 230 full-length articles 4.3 million views / month Cumulative edits Productive edits 1000 edits / month VandalismGood, NAR, 2011
11A review article for every gene is powerful Reelin: 98 editors, 703 edits since July 2002 Hyperlinks to related concepts Heparin: 358 editors, 654 edits since June 2003 AMPK: 109 editors, 203 edits since March 2004 RNAi: 394 editors, 994 edits since October 2002 References to the literature
12Making the Gene Wiki more computableFree text Structured annotations
13Filling the gaps in gene annotation Good, BMC Genomics 2011, 12:603 NCBI Entrez Gene: 3362 Gene Wiki mapping Wikilink Candidate assertion GO:0004993 GO exact synonym Annotator
14Filling the gaps in gene annotation Good, BMC Genomics 2011, 12:603 NCBI Entrez Gene: 334 Gene Wiki mapping Wikilink Candidate assertion GO:0006897 GO exact match Annotator
15Novel GO annotations – so what? Good, BMC Genomics 2011, 12:603 6319 11,022 ~100,000 “novel” 4703 (43%)annotations annotations annotations match knownmined from from GO @ 48-64% annotations Gene Wiki consortium specificity
16Gene Wiki content improves enrichment analysis axon Enrichment guidance GO term analysis(GO:0007411) 811 articles 264 genes PubMed Concept Gene list abstracts recognition GO:0007411 Yes NoLinked genes Yes 13 2 through No 251 12033 PubMed P = 1.55 E-20
17Gene Wiki content improves enrichment analysis muscle Enrichment contraction GO term analysis(GO:0006936) 251 articles 87 genes PubMed Concept Gene list abstracts recognition + Gene Wiki 87 articles GO:0006936 GO:0006936Linked genes Linked genes through through PubMed PubMed + Gene Wiki P = 1.0 P = 1.22 E-09
18Gene Wiki content improves enrichment analysis More p-value significant with(PubMed + GW) PubMed only Muscle contraction More significant with PubMed + GW p-value (PubMed only)
19Gene Wiki+ for integrative queries mwsync http://genewikiplus.org
20Dynamic queries across genes, diseases, SNPs
27 Collaborators Group membersDoug Howe, ZFIN Erik Clarke Ian MacleodJohn Hogenesch, U PennJon Huss, GNF Ben Good Max NanisLuca de Alfaro, UCSC Salvatore Loguercio Chunlei WuAngel Pizzaro, U PennFaramarz Valafar, SDSUPierre Lindenbaum, Fondation Jean Dausset ISMB travel supportMichael Martone, RushKonrad Koehler, Karo BioWarren Kibbe, Simon Lim, NorthwesternMany Wikipedia editors WP:MCB Project Contact http://sulab.org email@example.com @andrewsu +Andrew Su Funding and Support (BioGPS: GM83924, Gene Wiki: GM089820)
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.