SlideShare a Scribd company logo
1 of 1
Games for improving human phenotype prediction
                                                                              Benjamin M Good, Salvatore Loguercio, Andrew I Su
                                                                             The Scripps Research Institute, La Jolla, California, USA


                        ABSTRACT
                             ABSTRACT                                                       Dizeez: gene – disease annotation quiz                                         Combo: feature selection with community intelligence

An important goal for biomedical research is to produce genetic and                                          Select the disease related to the clue                 • Goal: pick the best set of genes
genomic predictors for human phenotypes such as disease prognosis or                                         gene. Guess as many as you can in                      • Best: the gene set that produces the best decision tree classifier
drug response. To this end, we can now quantify an extremely large
                                                                                                             one minute.                                            • Classifier: created using training data and selected genes, used to
number of potential biomarkers for any biological sample. In fact, a
single sample could reasonably be described by millions of molecular                                                                                                  predict phenotype (e.g. breast cancer prognosis)
variations in DNA, RNA, proteins, and metabolites. However, the actual                                       Every guess adds weight to a link
number of samples processed typically remains small in comparison. As a                                      between a gene and a disease.
result, attempts to use this data to build predictors often face problems                                                                                                           A game board                                      A hand
of overfitting. (While a predictive pattern may describe training data
very well, it may not reproduce well on other datasets.)                                                                      Preliminary Results
                                                                                                                            713 games, 180 players;
It has recently been shown that biological knowledge in the form of gene
annotations and pathway databases can be used to guide the process of
inferring phenotype predictors [1-3]. While promising, such methods are
                                                                                                                          Overall: 4,585 unique gene-
limited by the amount, quality and problem-specific applicability of the                                                      disease assertions.
structured knowledge that is available.
                                                                                                                        224 assertions provided more
Following in the line of games that have recently demonstrated success                                                   than once and not found in
as a means of ‘crowdsourcing’ difficult biological problems [4,5], we are                                                    OMIM/PharmGKB.
developing games with the purpose of improving human phenotype                                                                                                                                                                                  Inferred
                                                                                                                                                                             Score: 78 (percent correct)                                        decision tree
predictions. Our games work on two levels: (1) games such as Dizeez                                                             Top associations
and GenESP collect novel gene annotations and (2) games like Combo                                                           provided four or more                       Game Score: determined by
engage players directly in the process of predictor inference.                                                               times and not found in                      estimating performance of trees
                                                                                                                                                                         constructed using the selected                    Feature sets from many
                                                                                                                               OMIM/PharmGKB.                            features on training data.                        individual games used to create
Play game prototypes at:      http://www.genegames.org                                                                                                                                                                     a Decision Tree Forest classifier.
                                                                                   Even after limited game playing, the Dizeez game resulted in the                                                                        (Each tree votes once.)
                                                                                      identification of several novel gene-disease annotations.
                           Game Objectives                                                                                                                                                                                    Human Guided Forest
                                                                                     GeneESP: gene – concept association with a partner
                                                                                                                                                                                                                           Ensemble classifier where
         Phenotype                          •    Capture general                                                                                                                                                           components are decision
                                                                                                                                                                                                                           trees      constructed using
                                                 community                                                                                                                                                                 manually selected subsets of
                                                 knowledge in a                                                                                                                                                            features.      Adaptation of
     gene              pathway                   useful structure                                                                                                                                                          Network Guided and Random
                                                                                                                                                                                                                           Forests [1,2].


                gene
                                                       Community
                                                                                                                               Guess what genes your partner
                                                                                                                                                                                                            REFERENCES
                                                                                                                                is thinking about when they    1. Dutkowski and Ideker (2011) Protein Networks as Logic Functions in Development and Cancer. PLoS
                                                                                                                                                                  Computational Biology
                                                                                                                                     see ‘neuroblastoma’       2. Winter et al (2012) Google Goes Cancer: Improving Outcome Prediction for Cancer Patients by Network-Based
                                                                                                                                                                  Ranking of Marker Genes. PLoS Computational Biology

 •    Concentrate                                                               Improvements compared to Dizeez:
                                                                                                                                                               3. Liu et al (2012) Identifying dysregulated pathways in cancers from pathway interaction networks. BMC
                                                                                                                                                                  Bioinformatics
                                                                                                                                                               4. Good and Su (2011) Games with a Scientific Purpose. Genome Biology
      community knowledge                                                       • Reward new, useful annotations with points                                   5. Kawrykow et al (2012) Phylo: A Citizen Science Approach for Improving Multiple Sequence Alignment. PLoS One

      and reasoning around                                                      • Add social interaction                                                                                                       CONTACT
      predicting a particular                                                   • Enable gene-gene, gene-disease, gene-function
                                                                                                                                                               Benjamin Good: bgood@scripps.edu Salvatore Loguercio: loguerci@scripps.edu Andrew Su: asu@scripps.edu
      phenotype                                                                    games on the same platform
                                                                                • Increase scalability of annotation collection (does                                                                         FUNDING
                                                               Phenotype 1                                                                                     We acknowledge support from the National Institute of General Medical Sciences (GM089820 and
                                                                                   not depend on a database of ‘right’ answers)                                GM083924) and the NIH through the FaceBase Consortium for a particular emphasis on
                                                               Phenotype 2                                                                                     craniofacial genes (DE-20057).
                                                                                                                                                               .

More Related Content

Similar to Games for improving human phenotype prediction

Human Guided Forests (HGF)
Human Guided Forests (HGF)Human Guided Forests (HGF)
Human Guided Forests (HGF)Benjamin Good
 
Novel network pharmacology methods for drug mechanism of action identificatio...
Novel network pharmacology methods for drug mechanism of action identificatio...Novel network pharmacology methods for drug mechanism of action identificatio...
Novel network pharmacology methods for drug mechanism of action identificatio...laserxiong
 
Toolsfornetworkbiology 1
Toolsfornetworkbiology 1Toolsfornetworkbiology 1
Toolsfornetworkbiology 1pluskjw
 
Genevestigator
GenevestigatorGenevestigator
GenevestigatorBITS
 
BITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics dataBITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics dataBITS
 
The Cure: Making a game of gene selection for breast cancer survival prediction
The Cure: Making a game of gene selection for breast cancer survival predictionThe Cure: Making a game of gene selection for breast cancer survival prediction
The Cure: Making a game of gene selection for breast cancer survival predictionBenjamin Good
 
Multi-scale network biology model & the model library
Multi-scale network biology model & the model libraryMulti-scale network biology model & the model library
Multi-scale network biology model & the model librarylaserxiong
 
Stephen Friend Cytoscape Retreat 2011-05-20
Stephen Friend Cytoscape Retreat 2011-05-20Stephen Friend Cytoscape Retreat 2011-05-20
Stephen Friend Cytoscape Retreat 2011-05-20Sage Base
 
Research Frontier: Cognitive Performance Genomics
Research Frontier: Cognitive Performance GenomicsResearch Frontier: Cognitive Performance Genomics
Research Frontier: Cognitive Performance GenomicsMelanie Swan
 
Bio-IT 2010 Genome Commons
Bio-IT 2010 Genome CommonsBio-IT 2010 Genome Commons
Bio-IT 2010 Genome CommonsReece Hart
 
Pathema: A Bioinformatics Resource Center
Pathema: A Bioinformatics Resource CenterPathema: A Bioinformatics Resource Center
Pathema: A Bioinformatics Resource CenterPathema
 
Genetic Algorithm Demonstation System
Genetic Algorithm Demonstation SystemGenetic Algorithm Demonstation System
Genetic Algorithm Demonstation SystemBenjamin Murphy
 
GeneArt® services - Gene synthesis through protein production
GeneArt® services - Gene synthesis through protein productionGeneArt® services - Gene synthesis through protein production
GeneArt® services - Gene synthesis through protein productionThermo Fisher Scientific
 
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic DatabaseTowards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic DatabaseHilmar Lapp
 
Stephen Friend NIH PPP Coordinating Committee Meeting 2012-02-16
Stephen Friend NIH PPP Coordinating Committee Meeting 2012-02-16Stephen Friend NIH PPP Coordinating Committee Meeting 2012-02-16
Stephen Friend NIH PPP Coordinating Committee Meeting 2012-02-16Sage Base
 

Similar to Games for improving human phenotype prediction (18)

Human Guided Forests (HGF)
Human Guided Forests (HGF)Human Guided Forests (HGF)
Human Guided Forests (HGF)
 
Novel network pharmacology methods for drug mechanism of action identificatio...
Novel network pharmacology methods for drug mechanism of action identificatio...Novel network pharmacology methods for drug mechanism of action identificatio...
Novel network pharmacology methods for drug mechanism of action identificatio...
 
Toolsfornetworkbiology 1
Toolsfornetworkbiology 1Toolsfornetworkbiology 1
Toolsfornetworkbiology 1
 
Genevestigator
GenevestigatorGenevestigator
Genevestigator
 
BITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics dataBITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics data
 
The Cure: Making a game of gene selection for breast cancer survival prediction
The Cure: Making a game of gene selection for breast cancer survival predictionThe Cure: Making a game of gene selection for breast cancer survival prediction
The Cure: Making a game of gene selection for breast cancer survival prediction
 
Multi-scale network biology model & the model library
Multi-scale network biology model & the model libraryMulti-scale network biology model & the model library
Multi-scale network biology model & the model library
 
Stephen Friend Cytoscape Retreat 2011-05-20
Stephen Friend Cytoscape Retreat 2011-05-20Stephen Friend Cytoscape Retreat 2011-05-20
Stephen Friend Cytoscape Retreat 2011-05-20
 
WiML Poster
WiML PosterWiML Poster
WiML Poster
 
Research Frontier: Cognitive Performance Genomics
Research Frontier: Cognitive Performance GenomicsResearch Frontier: Cognitive Performance Genomics
Research Frontier: Cognitive Performance Genomics
 
Gene Expression Lab Summary
Gene Expression Lab SummaryGene Expression Lab Summary
Gene Expression Lab Summary
 
Bio-IT 2010 Genome Commons
Bio-IT 2010 Genome CommonsBio-IT 2010 Genome Commons
Bio-IT 2010 Genome Commons
 
Pathema: A Bioinformatics Resource Center
Pathema: A Bioinformatics Resource CenterPathema: A Bioinformatics Resource Center
Pathema: A Bioinformatics Resource Center
 
Genetic Algorithm Demonstation System
Genetic Algorithm Demonstation SystemGenetic Algorithm Demonstation System
Genetic Algorithm Demonstation System
 
GeneArt® services - Gene synthesis through protein production
GeneArt® services - Gene synthesis through protein productionGeneArt® services - Gene synthesis through protein production
GeneArt® services - Gene synthesis through protein production
 
Ngs update
Ngs updateNgs update
Ngs update
 
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic DatabaseTowards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
 
Stephen Friend NIH PPP Coordinating Committee Meeting 2012-02-16
Stephen Friend NIH PPP Coordinating Committee Meeting 2012-02-16Stephen Friend NIH PPP Coordinating Committee Meeting 2012-02-16
Stephen Friend NIH PPP Coordinating Committee Meeting 2012-02-16
 

More from Benjamin Good

Representing and reasoning with biological knowledge
Representing and reasoning with biological knowledgeRepresenting and reasoning with biological knowledge
Representing and reasoning with biological knowledgeBenjamin Good
 
Integrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity ModelsIntegrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity ModelsBenjamin Good
 
Pathways2GO: Converting BioPax pathways to GO-CAMs
Pathways2GO: Converting BioPax pathways to GO-CAMsPathways2GO: Converting BioPax pathways to GO-CAMs
Pathways2GO: Converting BioPax pathways to GO-CAMsBenjamin Good
 
Building a Biomedical Knowledge Garden
Building a Biomedical Knowledge Garden Building a Biomedical Knowledge Garden
Building a Biomedical Knowledge Garden Benjamin Good
 
Wikidata and the Semantic Web of Food
Wikidata and the  Semantic Web of FoodWikidata and the  Semantic Web of Food
Wikidata and the Semantic Web of FoodBenjamin Good
 
Gene Wiki and Wikimedia Foundation SPARQL workshop
Gene Wiki and Wikimedia Foundation SPARQL workshopGene Wiki and Wikimedia Foundation SPARQL workshop
Gene Wiki and Wikimedia Foundation SPARQL workshopBenjamin Good
 
Opportunities and challenges presented by Wikidata in the context of biocuration
Opportunities and challenges presented by Wikidata in the context of biocurationOpportunities and challenges presented by Wikidata in the context of biocuration
Opportunities and challenges presented by Wikidata in the context of biocurationBenjamin Good
 
Scripps bioinformatics seminar_day_2
Scripps bioinformatics seminar_day_2Scripps bioinformatics seminar_day_2
Scripps bioinformatics seminar_day_2Benjamin Good
 
Computing on the shoulders of giants
Computing on the shoulders of giantsComputing on the shoulders of giants
Computing on the shoulders of giantsBenjamin Good
 
Wikidata workshop for ISB Biocuration 2016
Wikidata workshop for ISB Biocuration 2016Wikidata workshop for ISB Biocuration 2016
Wikidata workshop for ISB Biocuration 2016Benjamin Good
 
Channeling Collaborative Spirit
Channeling Collaborative SpiritChanneling Collaborative Spirit
Channeling Collaborative SpiritBenjamin Good
 
2016 bd2k bgood_wikidata
2016 bd2k bgood_wikidata2016 bd2k bgood_wikidata
2016 bd2k bgood_wikidataBenjamin Good
 
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery (Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery Benjamin Good
 
Gene Wiki and Mark2Cure update for BD2K
Gene Wiki and Mark2Cure update for BD2KGene Wiki and Mark2Cure update for BD2K
Gene Wiki and Mark2Cure update for BD2KBenjamin Good
 
2015 6 bd2k_biobranch_knowbio
2015 6 bd2k_biobranch_knowbio2015 6 bd2k_biobranch_knowbio
2015 6 bd2k_biobranch_knowbioBenjamin Good
 
Citizen sciencepanel2015 pdf
Citizen sciencepanel2015 pdfCitizen sciencepanel2015 pdf
Citizen sciencepanel2015 pdfBenjamin Good
 

More from Benjamin Good (20)

Representing and reasoning with biological knowledge
Representing and reasoning with biological knowledgeRepresenting and reasoning with biological knowledge
Representing and reasoning with biological knowledge
 
Integrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity ModelsIntegrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity Models
 
Pathways2GO: Converting BioPax pathways to GO-CAMs
Pathways2GO: Converting BioPax pathways to GO-CAMsPathways2GO: Converting BioPax pathways to GO-CAMs
Pathways2GO: Converting BioPax pathways to GO-CAMs
 
Knowledge Beacons
Knowledge BeaconsKnowledge Beacons
Knowledge Beacons
 
Building a Biomedical Knowledge Garden
Building a Biomedical Knowledge Garden Building a Biomedical Knowledge Garden
Building a Biomedical Knowledge Garden
 
Science Game Lab
Science Game LabScience Game Lab
Science Game Lab
 
Wikidata and the Semantic Web of Food
Wikidata and the  Semantic Web of FoodWikidata and the  Semantic Web of Food
Wikidata and the Semantic Web of Food
 
Gene Wiki and Wikimedia Foundation SPARQL workshop
Gene Wiki and Wikimedia Foundation SPARQL workshopGene Wiki and Wikimedia Foundation SPARQL workshop
Gene Wiki and Wikimedia Foundation SPARQL workshop
 
Opportunities and challenges presented by Wikidata in the context of biocuration
Opportunities and challenges presented by Wikidata in the context of biocurationOpportunities and challenges presented by Wikidata in the context of biocuration
Opportunities and challenges presented by Wikidata in the context of biocuration
 
Scripps bioinformatics seminar_day_2
Scripps bioinformatics seminar_day_2Scripps bioinformatics seminar_day_2
Scripps bioinformatics seminar_day_2
 
Computing on the shoulders of giants
Computing on the shoulders of giantsComputing on the shoulders of giants
Computing on the shoulders of giants
 
Wikidata workshop for ISB Biocuration 2016
Wikidata workshop for ISB Biocuration 2016Wikidata workshop for ISB Biocuration 2016
Wikidata workshop for ISB Biocuration 2016
 
Channeling Collaborative Spirit
Channeling Collaborative SpiritChanneling Collaborative Spirit
Channeling Collaborative Spirit
 
2016 bd2k bgood_wikidata
2016 bd2k bgood_wikidata2016 bd2k bgood_wikidata
2016 bd2k bgood_wikidata
 
2016 mem good
2016 mem good2016 mem good
2016 mem good
 
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery (Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery
 
Gene Wiki and Mark2Cure update for BD2K
Gene Wiki and Mark2Cure update for BD2KGene Wiki and Mark2Cure update for BD2K
Gene Wiki and Mark2Cure update for BD2K
 
2015 6 bd2k_biobranch_knowbio
2015 6 bd2k_biobranch_knowbio2015 6 bd2k_biobranch_knowbio
2015 6 bd2k_biobranch_knowbio
 
(Bio)Hackathons
(Bio)Hackathons(Bio)Hackathons
(Bio)Hackathons
 
Citizen sciencepanel2015 pdf
Citizen sciencepanel2015 pdfCitizen sciencepanel2015 pdf
Citizen sciencepanel2015 pdf
 

Games for improving human phenotype prediction

  • 1. Games for improving human phenotype prediction Benjamin M Good, Salvatore Loguercio, Andrew I Su The Scripps Research Institute, La Jolla, California, USA ABSTRACT ABSTRACT Dizeez: gene – disease annotation quiz Combo: feature selection with community intelligence An important goal for biomedical research is to produce genetic and Select the disease related to the clue • Goal: pick the best set of genes genomic predictors for human phenotypes such as disease prognosis or gene. Guess as many as you can in • Best: the gene set that produces the best decision tree classifier drug response. To this end, we can now quantify an extremely large one minute. • Classifier: created using training data and selected genes, used to number of potential biomarkers for any biological sample. In fact, a single sample could reasonably be described by millions of molecular predict phenotype (e.g. breast cancer prognosis) variations in DNA, RNA, proteins, and metabolites. However, the actual Every guess adds weight to a link number of samples processed typically remains small in comparison. As a between a gene and a disease. result, attempts to use this data to build predictors often face problems A game board A hand of overfitting. (While a predictive pattern may describe training data very well, it may not reproduce well on other datasets.) Preliminary Results 713 games, 180 players; It has recently been shown that biological knowledge in the form of gene annotations and pathway databases can be used to guide the process of inferring phenotype predictors [1-3]. While promising, such methods are Overall: 4,585 unique gene- limited by the amount, quality and problem-specific applicability of the disease assertions. structured knowledge that is available. 224 assertions provided more Following in the line of games that have recently demonstrated success than once and not found in as a means of ‘crowdsourcing’ difficult biological problems [4,5], we are OMIM/PharmGKB. developing games with the purpose of improving human phenotype Inferred Score: 78 (percent correct) decision tree predictions. Our games work on two levels: (1) games such as Dizeez Top associations and GenESP collect novel gene annotations and (2) games like Combo provided four or more Game Score: determined by engage players directly in the process of predictor inference. times and not found in estimating performance of trees constructed using the selected Feature sets from many OMIM/PharmGKB. features on training data. individual games used to create Play game prototypes at: http://www.genegames.org a Decision Tree Forest classifier. Even after limited game playing, the Dizeez game resulted in the (Each tree votes once.) identification of several novel gene-disease annotations. Game Objectives Human Guided Forest GeneESP: gene – concept association with a partner Ensemble classifier where Phenotype • Capture general components are decision trees constructed using community manually selected subsets of knowledge in a features. Adaptation of gene pathway useful structure Network Guided and Random Forests [1,2]. gene Community Guess what genes your partner REFERENCES is thinking about when they 1. Dutkowski and Ideker (2011) Protein Networks as Logic Functions in Development and Cancer. PLoS Computational Biology see ‘neuroblastoma’ 2. Winter et al (2012) Google Goes Cancer: Improving Outcome Prediction for Cancer Patients by Network-Based Ranking of Marker Genes. PLoS Computational Biology • Concentrate Improvements compared to Dizeez: 3. Liu et al (2012) Identifying dysregulated pathways in cancers from pathway interaction networks. BMC Bioinformatics 4. Good and Su (2011) Games with a Scientific Purpose. Genome Biology community knowledge • Reward new, useful annotations with points 5. Kawrykow et al (2012) Phylo: A Citizen Science Approach for Improving Multiple Sequence Alignment. PLoS One and reasoning around • Add social interaction CONTACT predicting a particular • Enable gene-gene, gene-disease, gene-function Benjamin Good: bgood@scripps.edu Salvatore Loguercio: loguerci@scripps.edu Andrew Su: asu@scripps.edu phenotype games on the same platform • Increase scalability of annotation collection (does FUNDING Phenotype 1 We acknowledge support from the National Institute of General Medical Sciences (GM089820 and not depend on a database of ‘right’ answers) GM083924) and the NIH through the FaceBase Consortium for a particular emphasis on Phenotype 2 craniofacial genes (DE-20057). .