GeneGames.org  The Gene Wiki: Crowdsourcing     human gene annotation                Andrew Su, Ph.D.                    @...
2The Gene Wiki crib sheet                                                   http://www.slideshare.net/andrewsu   • Bulk cr...
3Seven million human hours                            http://www.flickr.com/photos/archana3k1/4124330493/
4Twenty million human hours                             http://www.flickr.com/photos/ableman/2171326385/
5-    150 billion human hours              per year                              http://www.flickr.com/photos/rvp-cw/62432...
6Using games to fold proteins        Fold.it players have successfully:        • Outperformed state of the art protein    ...
7Using games to fold RNAs              http://eterna.cmu.edu/
8Using games to align sequences              http://phylo.cs.mcgill.ca
9Using games to annotate genes?              http://genegames.org
10No good gene-disease annotation database             Query: Apolipoprotein E            Alzheimers disease (AD)         ...
11No good gene-disease annotation database             Query: Apolipoprotein E            Alzheimers disease (AD)         ...
12No good gene-disease annotation database              Query: Apolipoprotein E           ? Alzheimers disease (AD)       ...
13No good gene-disease annotation database             Query: Apolipoprotein E            Alzheimers disease (AD)    Memor...
14Play Dizeez to annotate gene-disease links                                                6. Play to win!               ...
15Dizeez players seem pretty smart…  In total (since Dec 2011):  • 207 unique gamers  • 1045 games played  • 8525 guesses#...
16Dizeez players seem pretty smart…  In total (since Dec 2011):  • 207 unique gamers  • 1045 games played  • 8525 guesses#...
17Using games to predict phenotype from genotype?                                  The Cure               http://genegames...
18Classification problems in genome biology                                                   Classify new   cancer       ...
19Random forests                                      Sample subset                                       of cases and   T...
20Random forests  cancer                     normal   100,000s features                       100s samples
21Random forests                                                         Classify new  cancer                     normal  ...
22Network-guided forests                         Dutkowski & Ideker (2011). PLoS Computational Biology
23Network-guided forests                                          Sample                                      features by ...
24Human-guided forests                                        Sample                                      features by    T...
25The Cure: Genomic predictors for disease
26The Cure: Genomic predictors for disease
27The Cure: Genomic predictors for disease
28The Cure: Genomic predictors for disease
29The Cure: Genomic predictors for disease
30The Cure: Genomic predictors for disease
31Human-guided forests                       Classify new                         samples                                 ...
32“Critical Assessment”-style challenge      Will this work? Check our blog after October 15.
33       Collaborators                                                        Group membersDoug Howe, ZFIN                ...
Upcoming SlideShare
Loading in...5
×

GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)

1,010
-1

Published on

Talk given at the Genome Informatics conference 2012 at Robinson College, Cambridge University.

Published in: Technology, Health & Medicine
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,010
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Empire state building
  • One of the seven wonders of the modern world
  • Except for a bit of personal pleasure, that expended effort has no societal valueOver last ~decade, “serious games” have attempted to harness this resourceTraining and educationHealth and fitness
  • Question: how to interject biological knowledge in the feature selection process?
  • GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)

    1. 1. GeneGames.org The Gene Wiki: Crowdsourcing human gene annotation Andrew Su, Ph.D. @andrewsu asu@scripps.edu http://sulab.org OK Genome Informatics OK September 6, 2012
    2. 2. 2The Gene Wiki crib sheet http://www.slideshare.net/andrewsu • Bulk creation of ~10k Wikipedia articles (http://dx.doi.org/10.1371/journal.pbio.0060175) • Monthly stats: > 4 million views, > 1000 edits (http://dx.doi.org/10.1093/nar/gkr925) • Text mining reveals novel Gene Ontology and Disease Ontology annotations (http://dx.doi.org/doi:10.1186/1471-2164- 12-603) • Mash-up with SNPedia for crowdsourced gene- disease database (http://www.jbiomedsem.com/content/3/S1/S6) • Merging Wikipedia with the Semantic Web (http://dx.doi.org/10.1093/database/bar060)
    3. 3. 3Seven million human hours http://www.flickr.com/photos/archana3k1/4124330493/
    4. 4. 4Twenty million human hours http://www.flickr.com/photos/ableman/2171326385/
    5. 5. 5- 150 billion human hours per year http://www.flickr.com/photos/rvp-cw/6243289302/
    6. 6. 6Using games to fold proteins Fold.it players have successfully: • Outperformed state of the art protein folding algorithms (Cooper, Nature, 2010) • Solved a previously-intractable crystal structure (Khatib, Nat Struct Mol Biol, 2011) • Designed an improved protein folding algorithm (Khatib, PNAS, 2011) • Improved enzyme activity of de novo designed enzyme (Eiben, Nat Biotechnol, 2011) http://fold.it
    7. 7. 7Using games to fold RNAs http://eterna.cmu.edu/
    8. 8. 8Using games to align sequences http://phylo.cs.mcgill.ca
    9. 9. 9Using games to annotate genes? http://genegames.org
    10. 10. 10No good gene-disease annotation database Query: Apolipoprotein E Alzheimers disease (AD) Lipoprotein glomerulopathy Sea-blue histiocyte disease
    11. 11. 11No good gene-disease annotation database Query: Apolipoprotein E Alzheimers disease (AD) Lipoprotein glomerulopathy Sea-blue histiocyte disease Hyperlipoproteinemia, type III Macular degeneration, age-related Myocardial infarction susceptibility
    12. 12. 12No good gene-disease annotation database Query: Apolipoprotein E ? Alzheimers disease (AD) ? Lipoprotein glomerulopathy ? Sea-blue histiocyte disease Hyperlipoproteinemia, type III ? Macular degeneration, age-related ? Myocardial infarction susceptibility HIV Psoriasis Vascular Diseases
    13. 13. 13No good gene-disease annotation database Query: Apolipoprotein E Alzheimers disease (AD) Memory Coronary Artery Disease Neuropsychological Tests Hypertension Cognition Disorders Mental Status Schedule Psychiatric Status Rating Dementia Scales Cognition Hyperlipidemias Atrophy Disease Progression Dementia, Vascular Cardiovascular Diseases Parkinson Disease Brain Injuries Coronary Disease Myocardial Infarction Diabetes Mellitus, Type 2 … Memory Disorders 477 diseases!
    14. 14. 14Play Dizeez to annotate gene-disease links 6. Play to win! 5. Hurry! 4. Then on to the next question… 3. If it‟s „right‟, you get points 1. Read the clue (gene) 2. Click the related disease (only one is “right”)
    15. 15. 15Dizeez players seem pretty smart… In total (since Dec 2011): • 207 unique gamers • 1045 games played • 8525 guesses# Occurrences Gene Disease Pubmed OMIM PharmGKB Gene Wiki 7 GAST gastrinoma 7 RBP3 retinoblastoma 7 SSX1 synovial sarcoma 6 TG Graves disease 6 CRYGC Cataract 6 SOX8 mental retardation 6 WRN Werner syndrome 6 ABL1 leukemia 6 MLL3 leukemia 6 SNAI2 breast carcinoma
    16. 16. 16Dizeez players seem pretty smart… In total (since Dec 2011): • 207 unique gamers • 1045 games played • 8525 guesses# Occurrences Gene Disease Pubmed OMIM PharmGKB Gene Wiki 5 MECOM sarcoma 4 ATF7 cancer 3 ABCB5 acute myeloid leukemia 3 SART1 glioblastoma 3 NCK1 leukemia 3 NEK1 cancer
    17. 17. 17Using games to predict phenotype from genotype? The Cure http://genegames.org
    18. 18. 18Classification problems in genome biology Classify new cancer normal samples find patterns cancer 100,000s features normal SVM Neural networks Naïve Bayes KNN … 100s samples
    19. 19. 19Random forests Sample subset of cases and Train decision cancer normal features tree 100,000s features 100s samples
    20. 20. 20Random forests cancer normal 100,000s features 100s samples
    21. 21. 21Random forests Classify new cancer normal samples cancer 100,000s features normal How to interject biological 100s samples knowledge?
    22. 22. 22Network-guided forests Dutkowski & Ideker (2011). PLoS Computational Biology
    23. 23. 23Network-guided forests Sample features by PPI Train decision cancer normal network tree 100,000s features 100s samples
    24. 24. 24Human-guided forests Sample features by Train decision cancer normal human tree intelligence 100,000s features 100s samples
    25. 25. 25The Cure: Genomic predictors for disease
    26. 26. 26The Cure: Genomic predictors for disease
    27. 27. 27The Cure: Genomic predictors for disease
    28. 28. 28The Cure: Genomic predictors for disease
    29. 29. 29The Cure: Genomic predictors for disease
    30. 30. 30The Cure: Genomic predictors for disease
    31. 31. 31Human-guided forests Classify new samples cancer normal
    32. 32. 32“Critical Assessment”-style challenge Will this work? Check our blog after October 15.
    33. 33. 33 Collaborators Group membersDoug Howe, ZFIN Ben Good Max NanisJohn Hogenesch, U PennJon Huss, GNF Salvatore Loguercio Chunlei WuLuca de Alfaro, UCSC Ian MacleodAngel Pizzaro, U PennFaramarz Valafar, SDSUPierre Lindenbaum, Fondation Jean DaussetMichael Martone, RushKonrad Koehler, Karo BioWarren Kibbe, Simon Lim, NorthwesternMany Wikipedia editors WP:MCB Project Contact http://sulab.org Recruiting graduate students asu@scripps.edu in quantitative biology! See @andrewsu http://education.scripps.edu/ +Andrew Su Funding and Support @genegame (BioGPS: GM83924, Gene Wiki: GM089820)
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×