GAMES FOR HUMAN GENE ANNOTATION      Benjamin Good*, Salvatore Loguercio, Andrew Su                     The Scripps Resear...
WHY GAMES?      It is estimated that 9 billion      hours are spent playing      Solitaire every yearVon Ahn L. : Google T...
Seven million hours of human labor               ONE YEAR SOLITAIRE =               1,285 EMPIRE STATE               BUILD...
150 billion hours                    1.6E+11                    1.4E+11                    1.2E+11                      1E...
GAMES WITH A PURPOSELabel all images on the Web                               Devise protein folding algorithms           ...
Annotate all human genes
ANNOTATE ALL HUMAN GENESRecord the relevant properties of each gene in a manner that facilitates computation              ...
BUILDING AN ANNOTATION            1. do science                                       2. publish it                       ...
WHY DOES HE LOOK SO TIRED?
MANY SCIENTISTS, POWERFUL TOOLS
GROWTH OF POTENTIAL ANNOTATIONS           1000000            950000            900000                                 112 ...
HOW DO WE INVOLVE THE COMMUNITY IN GENEANNOTATION?
HOW DO WE INVOLVE THE COMMUNITY IN GENEANNOTATION?         Make it fun!
LINK GENES TO DISEASES WITH DIZEEZ                  hurry!                                      then on to the next questi...
DIZEEZ IS FUN.. TO SOME PEOPLE• Advertised with a blog post, a few tweets and conference poster• Results since Dec. 2011: ...
QUALITY THROUGH REPLICATION  Distinct gene-disease pairs collected   4,585                                           482  ...
ABCB5 IS RELATED TO ACUTE MYELOID LEUKEMIA
PROBLEMS WITH DIZEEZ• Dizeez actually punishes desired behavior (adding  new, unknown associations) by not awarding points...
(modeled after the ESP Game). See: Ahn and Dabbish (2004) Labeling images with a computer game, SIGCHI
NO DATA YET,      PLAY NOW!http://genegames.org
A RE-USABLE PATTERN          Gene                  Disease          Gene                  Function          Gene          ...
GENOMIC PREDICTORScancer   normal                          make predictions on new samples                  find patterns ...
THE TRICK IS TO FIND THE RIGHT COMBINATION      Out of the 25,000+ genes, which      work together the best?    Purely com...
NETWORK GUIDED FOREST (NGF)    Use network to find    good gene    combinationsDutkowski & Ideker (2011) Protein Networks ...
THE TRICK IS TO FIND THE RIGHT COMBINATION
‘COMBO’: FIND THE RIGHT COMBINATION OFGENES TO BUILD A PHENOTYPE PREDICTOR
HUMAN GUIDED FOREST (HGF)    Let COMBO players    build decision moduleshttp://i9606.blogspot.com/2012/04/human-guided-for...
150 billion hours...                    1.6E+11                    1.4E+11                    1.2E+11                     ...
THE ENDThanks to:                  More information at:                            http://genegames.orgAndrew SuSalvatore ...
GO IS NOT KEEPING UP       1200000       1000000                                                                          ...
ANOTHER MAJOR ANNOTATION PROBLEMAnnotate all the images on the web              dog                           firehose    ...
A SUCCESSFUL MODEL
ESP GAME RESULTSfirst 3 months (2003)    • 13,630 players added 1,271,451 labels to 293,760 images    • became http://imag...
Gene annotation games
Upcoming SlideShare
Loading in...5
×

Gene annotation games

976

Published on

Talk at the Salk Institute's 2012 Systems to Synthesis Symposium. Discusses the use of online games with the purpose of annotating the human genome and building better phenotype predictors.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
976
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
13
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • how many hours in a year
  • Given a list of active genes produced from an experimentwhat key biological processes are happening in the cells?what diseases are these genes associated with?Given a list of genetic variationswhat diseases is a patient more susceptible to?what drugs should they take/avoid?etc.
  • Given a list of active genes produced from an experimentwhat key biological processes are happening in the cells?what diseases are these genes associated with?Given a list of genetic variationswhat diseases is a patient more susceptible to?what drugs should they take/avoid?etc.
  • Given a list of active genes produced from an experimentwhat key biological processes are happening in the cells?what diseases are these genes associated with?Given a list of genetic variationswhat diseases is a patient more susceptible to?what drugs should they take/avoid?etc.
  • more than 21 million in total, actual data back in ?
  • how do you play the game?
  • (about 12 hours) emphasize success
  • disregard what was considered the ‘right’ answer
  • expression of the ABCB5 gene is highly increased in B-precursor ALL
  • These games might be away to expose the network of human knowledge about the genome in a way that lets us compute with it.
  • a, Two-dimensional presentation of transcript ratios for 98 breast tumours. There were 4,968 significant genes across the group. Each row represents a tumour and each column a single gene. As shown in the colour bar, red indicates upregulation, green downregulation, black no change, and grey no data available. The yellow line marks the subdivision into two dominant tumour clusters. b, Selected clinical data for the 98 patients in a: BRCA1 germline mutation carrier (or sporadic patient), ER expression, tumour grade 3 (versus grade 1 and 2), lymphocytic infiltrate, angioinvasion, and metastasis status. White indicates positive, black negative and grey denotes tumours derived from BRCA1 germline carriers who were excluded from the metastasis evaluation. The cluster below the yellow line consists of 36 tumours, of which 34 are ER negative (total 39 ER-negative) and 16 are carriers of the BRCA1 mutation (total 18). c, Enlarged portion from a containing a group of genes that co-regulate with the ER- gene (ESR1). Each gene is labelled by its gene name or accession number from GenBank. Contig ESTs ending with RC are reverse-complementary of the named contig EST. d, Enlarged portion from a containing a group of co-regulated genes that are the molecular reflection of extensive lymphocytic infiltrate, and comprise a set of genes expressed in T and B cells. (Gene annotation as in c.)
  • how many hours in a year
  • You can’t win by random typing. Agreement between different players produces reliable annotations.
  • Gene annotation games

    1. 1. GAMES FOR HUMAN GENE ANNOTATION Benjamin Good*, Salvatore Loguercio, Andrew Su The Scripps Research Institute http://genegames.org April 20, 2012 7 th Annual Systems to Synthesis Symposium at the Salk Institute
    2. 2. WHY GAMES? It is estimated that 9 billion hours are spent playing Solitaire every yearVon Ahn L. : Google Tech Talk: Human Computation 2006.
    3. 3. Seven million hours of human labor ONE YEAR SOLITAIRE = 1,285 EMPIRE STATE BUILDINGS Empire State BuildingVon Ahn L. : Google Tech Talk: Human Computation 2006.
    4. 4. 150 billion hours 1.6E+11 1.4E+11 1.2E+11 1E+11 8E+10 6E+10 4E+10 2E+10 0 empire state one year of one year of building solitaire games 7M 9B 150BMcGonigal J. Reality is broken : why games make us better and how they can change the world.New York: Penguin Press; 2011.
    5. 5. GAMES WITH A PURPOSELabel all images on the Web Devise protein folding algorithms Fix multiple sequence alignments Design RNA molecules
    6. 6. Annotate all human genes
    7. 7. ANNOTATE ALL HUMAN GENESRecord the relevant properties of each gene in a manner that facilitates computation • biological process • molecular function • cellular localization • interaction partners • disease relevance Gene • genomic location • genetic variations • post translational modifications • related drugs • related publications • ...
    8. 8. BUILDING AN ANNOTATION 1. do science 2. publish it 3. curate the knowledge Gene Biological process, disease etc.image credits:phillipmartin.infowikipedia.org/wiki/Manuscriptbeyondcomputingmag.com/
    9. 9. WHY DOES HE LOOK SO TIRED?
    10. 10. MANY SCIENTISTS, POWERFUL TOOLS
    11. 11. GROWTH OF POTENTIAL ANNOTATIONS 1000000 950000 900000 112 publications/hour 850000 (37 more by the end of this talk) Number 800000 articles 750000 added to PubMed 700000 650000 600000 550000 500000
    12. 12. HOW DO WE INVOLVE THE COMMUNITY IN GENEANNOTATION?
    13. 13. HOW DO WE INVOLVE THE COMMUNITY IN GENEANNOTATION? Make it fun!
    14. 14. LINK GENES TO DISEASES WITH DIZEEZ hurry! then on to the next question If its ‘right’, you get points Click the related disease
    15. 15. DIZEEZ IS FUN.. TO SOME PEOPLE• Advertised with a blog post, a few tweets and conference poster• Results since Dec. 2011: • 180 people have played it • 713 one minute game rounds have been completed • 4,585 distinct gene-disease associations collected
    16. 16. QUALITY THROUGH REPLICATION Distinct gene-disease pairs collected 4,585 482 collected more than once Potential new annotations (do not appear in OMIM, PharmGKB 224 example: ABCB5  Acute myeloid leukemia
    17. 17. ABCB5 IS RELATED TO ACUTE MYELOID LEUKEMIA
    18. 18. PROBLEMS WITH DIZEEZ• Dizeez actually punishes desired behavior (adding new, unknown associations) by not awarding points• Does not allow player to enter associations other than those in the provided list• GenESP fixes both problems
    19. 19. (modeled after the ESP Game). See: Ahn and Dabbish (2004) Labeling images with a computer game, SIGCHI
    20. 20. NO DATA YET, PLAY NOW!http://genegames.org
    21. 21. A RE-USABLE PATTERN Gene Disease Gene Function Gene Gene Gene relationship Gene
    22. 22. GENOMIC PREDICTORScancer normal make predictions on new samples find patterns cancer normal
    23. 23. THE TRICK IS TO FIND THE RIGHT COMBINATION Out of the 25,000+ genes, which work together the best? Purely computational approaches have they always find patterns, hard to trouble generalizing know when they are real
    24. 24. NETWORK GUIDED FOREST (NGF) Use network to find good gene combinationsDutkowski & Ideker (2011) Protein Networks as Logic Functions in Development in Development and Cancer. PLoS Computational Biology
    25. 25. THE TRICK IS TO FIND THE RIGHT COMBINATION
    26. 26. ‘COMBO’: FIND THE RIGHT COMBINATION OFGENES TO BUILD A PHENOTYPE PREDICTOR
    27. 27. HUMAN GUIDED FOREST (HGF) Let COMBO players build decision moduleshttp://i9606.blogspot.com/2012/04/human-guided-forests-hgf.html
    28. 28. 150 billion hours... 1.6E+11 1.4E+11 1.2E+11 1E+11 8E+10 6E+10 4E+10 2E+10 0 empire state one year of one year of building solitaire games 7M 9B 150BMcGonigal J. Reality is broken : why games make us better and how they can change the world.New York: Penguin Press; 2011.
    29. 29. THE ENDThanks to: More information at: http://genegames.orgAndrew SuSalvatore Loguercio http://sulab.org/ bgood@scripps.edu our poster!
    30. 30. GO IS NOT KEEPING UP 1200000 1000000 112 publications/hour (37 more by the end of this talk) 800000 >21 million 600000 articles indexed in pubmed total GO annotations created 400000 200000 0 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011
    31. 31. ANOTHER MAJOR ANNOTATION PROBLEMAnnotate all the images on the web dog firehose drinking sprinkler
    32. 32. A SUCCESSFUL MODEL
    33. 33. ESP GAME RESULTSfirst 3 months (2003) • 13,630 players added 1,271,451 labels to 293,760 images • became http://images.google.com/imagelabeler/since scaled up to hundred thousand+ players and 10’s of millionsof images labeled. Ahn and Dabbish (2004) Labeling images with a computer game, SIGCHI
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×