1. Games for improving human phenotype prediction
Benjamin M Good, Salvatore Loguercio, Andrew I Su
The Scripps Research Institute, La Jolla, California, USA
ABSTRACT
ABSTRACT Dizeez: gene – disease annotation quiz Combo: feature selection with community intelligence
An important goal for biomedical research is to produce genetic and Select the disease related to the clue • Goal: pick the best set of genes
genomic predictors for human phenotypes such as disease prognosis or gene. Guess as many as you can in • Best: the gene set that produces the best decision tree classifier
drug response. To this end, we can now quantify an extremely large
one minute. • Classifier: created using training data and selected genes, used to
number of potential biomarkers for any biological sample. In fact, a
single sample could reasonably be described by millions of molecular predict phenotype (e.g. breast cancer prognosis)
variations in DNA, RNA, proteins, and metabolites. However, the actual Every guess adds weight to a link
number of samples processed typically remains small in comparison. As a between a gene and a disease.
result, attempts to use this data to build predictors often face problems A game board A hand
of overfitting. (While a predictive pattern may describe training data
very well, it may not reproduce well on other datasets.) Preliminary Results
713 games, 180 players;
It has recently been shown that biological knowledge in the form of gene
annotations and pathway databases can be used to guide the process of
inferring phenotype predictors [1-3]. While promising, such methods are
Overall: 4,585 unique gene-
limited by the amount, quality and problem-specific applicability of the disease assertions.
structured knowledge that is available.
224 assertions provided more
Following in the line of games that have recently demonstrated success than once and not found in
as a means of ‘crowdsourcing’ difficult biological problems [4,5], we are OMIM/PharmGKB.
developing games with the purpose of improving human phenotype Inferred
Score: 78 (percent correct) decision tree
predictions. Our games work on two levels: (1) games such as Dizeez Top associations
and GenESP collect novel gene annotations and (2) games like Combo provided four or more Game Score: determined by
engage players directly in the process of predictor inference. times and not found in estimating performance of trees
constructed using the selected Feature sets from many
OMIM/PharmGKB. features on training data. individual games used to create
Play game prototypes at: http://www.genegames.org a Decision Tree Forest classifier.
Even after limited game playing, the Dizeez game resulted in the (Each tree votes once.)
identification of several novel gene-disease annotations.
Game Objectives Human Guided Forest
GeneESP: gene – concept association with a partner
Ensemble classifier where
Phenotype • Capture general components are decision
trees constructed using
community manually selected subsets of
knowledge in a features. Adaptation of
gene pathway useful structure Network Guided and Random
Forests [1,2].
gene
Community
Guess what genes your partner
REFERENCES
is thinking about when they 1. Dutkowski and Ideker (2011) Protein Networks as Logic Functions in Development and Cancer. PLoS
Computational Biology
see ‘neuroblastoma’ 2. Winter et al (2012) Google Goes Cancer: Improving Outcome Prediction for Cancer Patients by Network-Based
Ranking of Marker Genes. PLoS Computational Biology
• Concentrate Improvements compared to Dizeez:
3. Liu et al (2012) Identifying dysregulated pathways in cancers from pathway interaction networks. BMC
Bioinformatics
4. Good and Su (2011) Games with a Scientific Purpose. Genome Biology
community knowledge • Reward new, useful annotations with points 5. Kawrykow et al (2012) Phylo: A Citizen Science Approach for Improving Multiple Sequence Alignment. PLoS One
and reasoning around • Add social interaction CONTACT
predicting a particular • Enable gene-gene, gene-disease, gene-function
Benjamin Good: bgood@scripps.edu Salvatore Loguercio: loguerci@scripps.edu Andrew Su: asu@scripps.edu
phenotype games on the same platform
• Increase scalability of annotation collection (does FUNDING
Phenotype 1 We acknowledge support from the National Institute of General Medical Sciences (GM089820 and
not depend on a database of ‘right’ answers) GM083924) and the NIH through the FaceBase Consortium for a particular emphasis on
Phenotype 2 craniofacial genes (DE-20057).
.