Gene annotation games

1,332 views

Published on

Talk at the Salk Institute's 2012 Systems to Synthesis Symposium. Discusses the use of online games with the purpose of annotating the human genome and building better phenotype predictors.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,332
On SlideShare
0
From Embeds
0
Number of Embeds
239
Actions
Shares
0
Downloads
19
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • how many hours in a year
  • Given a list of active genes produced from an experimentwhat key biological processes are happening in the cells?what diseases are these genes associated with?Given a list of genetic variationswhat diseases is a patient more susceptible to?what drugs should they take/avoid?etc.
  • Given a list of active genes produced from an experimentwhat key biological processes are happening in the cells?what diseases are these genes associated with?Given a list of genetic variationswhat diseases is a patient more susceptible to?what drugs should they take/avoid?etc.
  • Given a list of active genes produced from an experimentwhat key biological processes are happening in the cells?what diseases are these genes associated with?Given a list of genetic variationswhat diseases is a patient more susceptible to?what drugs should they take/avoid?etc.
  • more than 21 million in total, actual data back in ?
  • how do you play the game?
  • (about 12 hours) emphasize success
  • disregard what was considered the ‘right’ answer
  • expression of the ABCB5 gene is highly increased in B-precursor ALL
  • These games might be away to expose the network of human knowledge about the genome in a way that lets us compute with it.
  • a, Two-dimensional presentation of transcript ratios for 98 breast tumours. There were 4,968 significant genes across the group. Each row represents a tumour and each column a single gene. As shown in the colour bar, red indicates upregulation, green downregulation, black no change, and grey no data available. The yellow line marks the subdivision into two dominant tumour clusters. b, Selected clinical data for the 98 patients in a: BRCA1 germline mutation carrier (or sporadic patient), ER expression, tumour grade 3 (versus grade 1 and 2), lymphocytic infiltrate, angioinvasion, and metastasis status. White indicates positive, black negative and grey denotes tumours derived from BRCA1 germline carriers who were excluded from the metastasis evaluation. The cluster below the yellow line consists of 36 tumours, of which 34 are ER negative (total 39 ER-negative) and 16 are carriers of the BRCA1 mutation (total 18). c, Enlarged portion from a containing a group of genes that co-regulate with the ER- gene (ESR1). Each gene is labelled by its gene name or accession number from GenBank. Contig ESTs ending with RC are reverse-complementary of the named contig EST. d, Enlarged portion from a containing a group of co-regulated genes that are the molecular reflection of extensive lymphocytic infiltrate, and comprise a set of genes expressed in T and B cells. (Gene annotation as in c.)
  • how many hours in a year
  • You can’t win by random typing. Agreement between different players produces reliable annotations.
  • Gene annotation games

    1. 1. GAMES FOR HUMAN GENE ANNOTATION Benjamin Good*, Salvatore Loguercio, Andrew Su The Scripps Research Institute http://genegames.org April 20, 2012 7 th Annual Systems to Synthesis Symposium at the Salk Institute
    2. 2. WHY GAMES? It is estimated that 9 billion hours are spent playing Solitaire every yearVon Ahn L. : Google Tech Talk: Human Computation 2006.
    3. 3. Seven million hours of human labor ONE YEAR SOLITAIRE = 1,285 EMPIRE STATE BUILDINGS Empire State BuildingVon Ahn L. : Google Tech Talk: Human Computation 2006.
    4. 4. 150 billion hours 1.6E+11 1.4E+11 1.2E+11 1E+11 8E+10 6E+10 4E+10 2E+10 0 empire state one year of one year of building solitaire games 7M 9B 150BMcGonigal J. Reality is broken : why games make us better and how they can change the world.New York: Penguin Press; 2011.
    5. 5. GAMES WITH A PURPOSELabel all images on the Web Devise protein folding algorithms Fix multiple sequence alignments Design RNA molecules
    6. 6. Annotate all human genes
    7. 7. ANNOTATE ALL HUMAN GENESRecord the relevant properties of each gene in a manner that facilitates computation • biological process • molecular function • cellular localization • interaction partners • disease relevance Gene • genomic location • genetic variations • post translational modifications • related drugs • related publications • ...
    8. 8. BUILDING AN ANNOTATION 1. do science 2. publish it 3. curate the knowledge Gene Biological process, disease etc.image credits:phillipmartin.infowikipedia.org/wiki/Manuscriptbeyondcomputingmag.com/
    9. 9. WHY DOES HE LOOK SO TIRED?
    10. 10. MANY SCIENTISTS, POWERFUL TOOLS
    11. 11. GROWTH OF POTENTIAL ANNOTATIONS 1000000 950000 900000 112 publications/hour 850000 (37 more by the end of this talk) Number 800000 articles 750000 added to PubMed 700000 650000 600000 550000 500000
    12. 12. HOW DO WE INVOLVE THE COMMUNITY IN GENEANNOTATION?
    13. 13. HOW DO WE INVOLVE THE COMMUNITY IN GENEANNOTATION? Make it fun!
    14. 14. LINK GENES TO DISEASES WITH DIZEEZ hurry! then on to the next question If its ‘right’, you get points Click the related disease
    15. 15. DIZEEZ IS FUN.. TO SOME PEOPLE• Advertised with a blog post, a few tweets and conference poster• Results since Dec. 2011: • 180 people have played it • 713 one minute game rounds have been completed • 4,585 distinct gene-disease associations collected
    16. 16. QUALITY THROUGH REPLICATION Distinct gene-disease pairs collected 4,585 482 collected more than once Potential new annotations (do not appear in OMIM, PharmGKB 224 example: ABCB5  Acute myeloid leukemia
    17. 17. ABCB5 IS RELATED TO ACUTE MYELOID LEUKEMIA
    18. 18. PROBLEMS WITH DIZEEZ• Dizeez actually punishes desired behavior (adding new, unknown associations) by not awarding points• Does not allow player to enter associations other than those in the provided list• GenESP fixes both problems
    19. 19. (modeled after the ESP Game). See: Ahn and Dabbish (2004) Labeling images with a computer game, SIGCHI
    20. 20. NO DATA YET, PLAY NOW!http://genegames.org
    21. 21. A RE-USABLE PATTERN Gene Disease Gene Function Gene Gene Gene relationship Gene
    22. 22. GENOMIC PREDICTORScancer normal make predictions on new samples find patterns cancer normal
    23. 23. THE TRICK IS TO FIND THE RIGHT COMBINATION Out of the 25,000+ genes, which work together the best? Purely computational approaches have they always find patterns, hard to trouble generalizing know when they are real
    24. 24. NETWORK GUIDED FOREST (NGF) Use network to find good gene combinationsDutkowski & Ideker (2011) Protein Networks as Logic Functions in Development in Development and Cancer. PLoS Computational Biology
    25. 25. THE TRICK IS TO FIND THE RIGHT COMBINATION
    26. 26. ‘COMBO’: FIND THE RIGHT COMBINATION OFGENES TO BUILD A PHENOTYPE PREDICTOR
    27. 27. HUMAN GUIDED FOREST (HGF) Let COMBO players build decision moduleshttp://i9606.blogspot.com/2012/04/human-guided-forests-hgf.html
    28. 28. 150 billion hours... 1.6E+11 1.4E+11 1.2E+11 1E+11 8E+10 6E+10 4E+10 2E+10 0 empire state one year of one year of building solitaire games 7M 9B 150BMcGonigal J. Reality is broken : why games make us better and how they can change the world.New York: Penguin Press; 2011.
    29. 29. THE ENDThanks to: More information at: http://genegames.orgAndrew SuSalvatore Loguercio http://sulab.org/ bgood@scripps.edu our poster!
    30. 30. GO IS NOT KEEPING UP 1200000 1000000 112 publications/hour (37 more by the end of this talk) 800000 >21 million 600000 articles indexed in pubmed total GO annotations created 400000 200000 0 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011
    31. 31. ANOTHER MAJOR ANNOTATION PROBLEMAnnotate all the images on the web dog firehose drinking sprinkler
    32. 32. A SUCCESSFUL MODEL
    33. 33. ESP GAME RESULTSfirst 3 months (2003) • 13,630 players added 1,271,451 labels to 293,760 images • became http://images.google.com/imagelabeler/since scaled up to hundred thousand+ players and 10’s of millionsof images labeled. Ahn and Dabbish (2004) Labeling images with a computer game, SIGCHI

    ×