Except for a bit of personal pleasure, that expended effort has no societal valueOver last ~decade, “serious games” have attempted to harness this resourceTraining and educationHealth and fitness
Question: how to interject biological knowledge in the feature selection process?
GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)
GeneGames.org The Gene Wiki: Crowdsourcing human gene annotation Andrew Su, Ph.D. @andrewsu firstname.lastname@example.org http://sulab.org OK Genome Informatics OK September 6, 2012
2The Gene Wiki crib sheet http://www.slideshare.net/andrewsu • Bulk creation of ~10k Wikipedia articles (http://dx.doi.org/10.1371/journal.pbio.0060175) • Monthly stats: > 4 million views, > 1000 edits (http://dx.doi.org/10.1093/nar/gkr925) • Text mining reveals novel Gene Ontology and Disease Ontology annotations (http://dx.doi.org/doi:10.1186/1471-2164- 12-603) • Mash-up with SNPedia for crowdsourced gene- disease database (http://www.jbiomedsem.com/content/3/S1/S6) • Merging Wikipedia with the Semantic Web (http://dx.doi.org/10.1093/database/bar060)
3Seven million human hours http://www.flickr.com/photos/archana3k1/4124330493/
4Twenty million human hours http://www.flickr.com/photos/ableman/2171326385/
5- 150 billion human hours per year http://www.flickr.com/photos/rvp-cw/6243289302/
6Using games to fold proteins Fold.it players have successfully: • Outperformed state of the art protein folding algorithms (Cooper, Nature, 2010) • Solved a previously-intractable crystal structure (Khatib, Nat Struct Mol Biol, 2011) • Designed an improved protein folding algorithm (Khatib, PNAS, 2011) • Improved enzyme activity of de novo designed enzyme (Eiben, Nat Biotechnol, 2011) http://fold.it
7Using games to fold RNAs http://eterna.cmu.edu/
8Using games to align sequences http://phylo.cs.mcgill.ca
9Using games to annotate genes? http://genegames.org
10No good gene-disease annotation database Query: Apolipoprotein E Alzheimers disease (AD) Lipoprotein glomerulopathy Sea-blue histiocyte disease
11No good gene-disease annotation database Query: Apolipoprotein E Alzheimers disease (AD) Lipoprotein glomerulopathy Sea-blue histiocyte disease Hyperlipoproteinemia, type III Macular degeneration, age-related Myocardial infarction susceptibility
12No good gene-disease annotation database Query: Apolipoprotein E ? Alzheimers disease (AD) ? Lipoprotein glomerulopathy ? Sea-blue histiocyte disease Hyperlipoproteinemia, type III ? Macular degeneration, age-related ? Myocardial infarction susceptibility HIV Psoriasis Vascular Diseases
14Play Dizeez to annotate gene-disease links 6. Play to win! 5. Hurry! 4. Then on to the next question… 3. If it‟s „right‟, you get points 1. Read the clue (gene) 2. Click the related disease (only one is “right”)
15Dizeez players seem pretty smart… In total (since Dec 2011): • 207 unique gamers • 1045 games played • 8525 guesses# Occurrences Gene Disease Pubmed OMIM PharmGKB Gene Wiki 7 GAST gastrinoma 7 RBP3 retinoblastoma 7 SSX1 synovial sarcoma 6 TG Graves disease 6 CRYGC Cataract 6 SOX8 mental retardation 6 WRN Werner syndrome 6 ABL1 leukemia 6 MLL3 leukemia 6 SNAI2 breast carcinoma
16Dizeez players seem pretty smart… In total (since Dec 2011): • 207 unique gamers • 1045 games played • 8525 guesses# Occurrences Gene Disease Pubmed OMIM PharmGKB Gene Wiki 5 MECOM sarcoma 4 ATF7 cancer 3 ABCB5 acute myeloid leukemia 3 SART1 glioblastoma 3 NCK1 leukemia 3 NEK1 cancer
17Using games to predict phenotype from genotype? The Cure http://genegames.org
18Classification problems in genome biology Classify new cancer normal samples find patterns cancer 100,000s features normal SVM Neural networks Naïve Bayes KNN … 100s samples
19Random forests Sample subset of cases and Train decision cancer normal features tree 100,000s features 100s samples
20Random forests cancer normal 100,000s features 100s samples
21Random forests Classify new cancer normal samples cancer 100,000s features normal How to interject biological 100s samples knowledge?
31Human-guided forests Classify new samples cancer normal
32“Critical Assessment”-style challenge Will this work? Check our blog after October 15.
33 Collaborators Group membersDoug Howe, ZFIN Ben Good Max NanisJohn Hogenesch, U PennJon Huss, GNF Salvatore Loguercio Chunlei WuLuca de Alfaro, UCSC Ian MacleodAngel Pizzaro, U PennFaramarz Valafar, SDSUPierre Lindenbaum, Fondation Jean DaussetMichael Martone, RushKonrad Koehler, Karo BioWarren Kibbe, Simon Lim, NorthwesternMany Wikipedia editors WP:MCB Project Contact http://sulab.org Recruiting graduate students email@example.com in quantitative biology! See @andrewsu http://education.scripps.edu/ +Andrew Su Funding and Support @genegame (BioGPS: GM83924, Gene Wiki: GM089820)
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.