ASHG poster - Games for gene annotation and phenotype classification


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

ASHG poster - Games for gene annotation and phenotype classification

  1. 1. Games for gene annotation and phenotype classification Andrew I. Su, Salvatore Loguercio, Benjamin M. Good Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, CA ABSTRACT Game 3: The CureThe Empire State Building was built with 7 million hours of human effort. The Panama make predictions on The Challenge cancer normalCanal took 20 million hours to complete. By comparison, it is estimated that up to 150 new samplesbillion hours are spent playing games every year (9 billion on Solitaire alone). Obviouslypeople play games because they are enjoyable and fun. But aside from that enjoyment, • With tens of thousands of find patterns cancergames largely result in no tangible benefit, neither to the individual nor to society at measurements but onlylarge. hundreds of samples, Recently, several groups have built “games with a purpose”, a class of games that many possible patterns are normalfocuses on collaboratively harnessing gamers for productive ends. In biology, games have found.been built to fold proteins and RNAs, and to perform multiple sequence alignment. Here, • But which ones are real?we present our efforts to apply games to two critical challenges in genetics. First, we have built games focused on organizing and structuring gene annotations. Withthe increasing popularity of genome-scale science, many analysis strategies (including • Prior knowledge encoded in databases has been used to improve classifiers bygene set enrichment, pathway analysis, and cross-species comparisons) depend on guiding the search predictive gene sets [3]comprehensive and accurate gene annotations. These structured annotations are mostly • What about knowledge that is not recorded in structured databases?the result of centralized manual curation efforts, but these initiatives do not scale well • The Cure is designed to motivate and enable people to help improve the featurewith the explosive growth of the biomedical literature. We describe several games that selection step for predictor working biologists to extract their expert domain knowledge in computable form. Second, we describe a game for predicting human phenotypes from moleculardescriptors. Researchers can now relatively easily characterize any biological sample The Game Gene info. provided from Gene Ontology, Gene Rifs.according to a number of features, including genotype, gene expression, and epigenetics. • Goal: pick the best set Search box highlights genesA key challenge in the field is identifying exactly which of those molecular features can be of genes. with annotation matchused to predict a clinical phenotype like disease susceptibility or adverse drug events. • Best: the gene set thatWhile statistical classifiers have been applied to this challenge, they typically do not produces the bestincorporate prior biological knowledge, and they often fail to replicate in external test decision tree classifier ofpopulations. Here, we present results from the „The Cure‟ a game to help identify breast cancer prognosis.biomarker gene sets that can be used to improve predictions of breast cancer prognosis • Classifier: created usingbased on gene expression. training data and Play these games now!!! at: selected genes, used to predict phenotype. Game 1: Dizeez • Score: cross-validation performance of decision• Purpose: identify new gene-disease links tree using selected• Rules: genes and training data. • Select biological area (e.g. ‘cancer’) to start game. Decision trees built • Given a gene, guess the related disease. Your current ‘hand’. automatically using round ends at 5 cards genes in player’s • Points are awarded for correct guesses within one hands minute. • ‘Correct’ answers drawn from text mining RESULTS• Data: • 214 Players registered (125 in 1st • Clinical data • When several different players suggest the same week): 40% have a PhD. (Age, etc.) ‘incorrect’ gene-disease link, we detect a new candidate gene annotation. DIzeez Results • Predictor scored 69% correct on Sage Breast Cancer Prognosis • Time frame: 2 months Challenge test set. [4] • (Best of all submitted predictors • Unique players: 230 scored 72%) • Games played: 1045 • Awaiting results on external • Guesses collected: 8,525 • 3,954 games played in 47 days validation set. • Unique gene-disease pairs: 6,941 Genes selected at • Guesses that match existing annotation: highest frequency 4804 (69%) • For 14 novel gene-disease pairs guessed REFERENCES by >3 players, 9 (64%) were validated by 1. Salvatore Loguercio, Benjamin M. Good, Andrew I. Su (2012) Dizeez: an online game for a literature search human gene-disease annotation. In: Bio-Ontologies SIG, ISMB: 15 July 2011, Vienna. • Player consensus correlates with probability of validation [1] 2. Luis Von Ahn and Laura Dabbish (2004) Labeling images with a computer game. In: Game 2: GenESP Proceedings of the SIGCHI Conference on Human Factors in Computing Systems 3. Janus Dutkowski and Trey Ideker (2011) Protein Networks as Logic Functions in Development and Cancer. PLoS Computational Biology • Direct reward for 4. Sage bionetworks: DREAM7 Breast Cancer Prognosis Challenge. http://www.the-dream- consensus formation • Multiplayer • Open-ended Contact and Acknowledgements • Tested pattern [2] Benjamin Good: @bgood , Andrew Su: • Work in Progress Guess what genes your We acknowledge support from the National Institute of General Medical Sciences partner is thinking about when they see (GM089820 and GM083924) and the NIH through the FaceBase Consortium for a particular ‘neuroblastoma’ emphasis on craniofacial genes (DE-20057). RESEARCH POSTER PRESENTATION DESIGN © 2012