ISMB2012: The Gene Wiki: Crowdsourcing human gene annotation

The Gene Wiki: Crowdsourcing human gene
annotation
Andrew Su, Ph.D.
The Scripps Research Institute

ISMB
Special Session: Harnessing community
intelligence for bioinformatics
#ISMB #SS7

July 17, 2012

2
The Long Tail is a prolific source of content

Short
Head
Content
produced

Long Tail

Contributors (sorted)

News : Newspapers Blogs
Video: TV/Hollywood YouTube
Product reviews: Consumer reports Amazon reviews
Food reviews: Food critics Yelp
Talent judging: Olympics American Idol
Gene annotation: Manual curation Gene Wiki

3

We can harness the
Long Tail of scientists
to directly participate in
the gene annotation
process.

4
Wikipedia is reasonably accurate

5
Wikipedia has breadth and depth

Articles

Words
(millions)

Wikipedia Britannica
Online

http://en.wikipedia.org/wiki/Wikipedia:Size_comparisons, July 2008

Filtering, extracting, and summarizing PubMed

Documents

Concepts

7
Wiki success depends on a positive feedback

Gene wiki page utility

1 100
2 200

Number of Number of
contributors users

8
10,000 gene “stubs” within Wikipedia Utility

Users

Contributors

Protein structure
Gene
summary
Symbols and
identifiers

Gene Ontology
annotations
Protein
interactions

Tissue expression
Linked pattern
references

Links to structured
databases

Huss, PLoS Biol, 2008

9
Gene Wiki has a critical mass of readers
Utility

Users
Contributors
Total: ~4.3 million
views / month

Huss, PLoS Biol, 2008; Good, NAR, 2011

10
Gene Wiki has a critical mass of editors
Utility

~10,000 words added / month
Users
Contributors
Total 1.42 million words
≈ 230 full-length articles

4.3 million views / month

Cumulative edits
Productive
edits
1000 edits / month

Vandalism

Good, NAR, 2011

11
A review article for every gene is powerful

Reelin: 98 editors, 703 edits since July 2002
Hyperlinks to related concepts
Heparin: 358 editors, 654 edits since June 2003
AMPK: 109 editors, 203 edits since March 2004
RNAi: 394 editors, 994 edits since October 2002
References to the literature

12
Making the Gene Wiki more computable

Free text Structured annotations

13
Filling the gaps in gene annotation
Good, BMC Genomics 2011, 12:603

NCBI Entrez Gene: 3362

Gene Wiki
mapping

Wikilink Candidate
assertion

GO:0004993

GO exact
synonym

Annotator

14
Filling the gaps in gene annotation

NCBI Entrez Gene: 334

Gene Wiki
mapping

Wikilink Candidate
assertion

GO:0006897

GO exact
match

Annotator

15
Novel GO annotations – so what?

6319
11,022 ~100,000
“novel” 4703 (43%)
annotations annotations
annotations match known
mined from from GO
@ 48-64% annotations
Gene Wiki consortium
specificity

16
Gene Wiki content improves enrichment analysis
axon Enrichment
guidance GO term
analysis
(GO:0007411)

811 articles

264 genes PubMed Concept
Gene list
abstracts recognition

GO:0007411
Yes No
Linked genes Yes 13 2
through
No 251 12033
PubMed

P = 1.55 E-20

17
muscle Enrichment
contraction GO term
analysis
(GO:0006936)

251 articles

87 genes PubMed Concept
Gene list
abstracts recognition
+
Gene Wiki
87 articles
GO:0006936 GO:0006936

Linked genes Linked genes
through through
PubMed PubMed +
Gene Wiki
P = 1.0 P = 1.22 E-09

18

More
p-value significant with
(PubMed + GW) PubMed only

Muscle
contraction

More
significant with
PubMed + GW

p-value (PubMed only)

19
Gene Wiki+ for integrative queries

mwsync

http://genewikiplus.org

20
Dynamic queries across genes, diseases, SNPs

23

mwsync

OMIM
PharmGKB

{{#ask:
[[Category:Human_proteins]]
[[is_associated_with::

<q>[[Category:Breast_cancer]
]</q>]]
[[HasSNP::
…

<q>[[is_associated_with::

24

mwsync

OMIM
PharmGKB


25

The
Long Tail of scientists
is a valuable source of
information on gene
function

26
Crowdsourcing a gene annotation portal

27
Collaborators Group members
Doug Howe, ZFIN Erik Clarke Ian Macleod
John Hogenesch, U Penn
Jon Huss, GNF
Ben Good Max Nanis
Luca de Alfaro, UCSC Salvatore Loguercio Chunlei Wu
Angel Pizzaro, U Penn
Faramarz Valafar, SDSU
Pierre Lindenbaum,
Fondation Jean Dausset ISMB travel support
Michael Martone, Rush
Konrad Koehler, Karo Bio
Warren Kibbe, Simon Lim, Northwestern
Many Wikipedia editors
WP:MCB Project

Contact
http://sulab.org
asu@scripps.edu
@andrewsu
+Andrew Su

Funding and Support

(BioGPS: GM83924, Gene Wiki: GM089820)

ISMB2012: The Gene Wiki: Crowdsourcing human gene annotation

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (6)

Similar to ISMB2012: The Gene Wiki: Crowdsourcing human gene annotation

Similar to ISMB2012: The Gene Wiki: Crowdsourcing human gene annotation (20)

More from Andrew Su

More from Andrew Su (19)

Recently uploaded

Recently uploaded (20)

ISMB2012: The Gene Wiki: Crowdsourcing human gene annotation

Editor's Notes