Crowdsourcing Gene    Annotation      Anurag Priyam
Sequencing cost• Sequencing genomes is now inexpensive.• Many many genomes are now being sequenced.
Gene predictionab initioSearch genome for signsSequence similarity basedSearch genome for known sequences
Gene prediction ischallengingSome examples:• missing exon• truncated or overextended exon• gene split into several gene pr...
Incorrect Gene Prediction isProblematic• studying gene family evolution• RNAseq analyses• molecular evolution analyses
Manual curation• yields the best gene models• is time consuming• plausible for large communities (e.g. Human, C.  elegans)...
Crowdsourcing
GalaxyZoo
Foldit         Foldit
Foldit players contribute to             real science!• Christopher B Eiben et al (2012) Increase Diels-Alderase  activity...
Crowdsourcing works• GalaxyZoo volunteers have discovered real  galaxies.• Foldit players have guided real scientific  que...
Can we crowdsourcegene model curation?
Challenges• recruiting contributors• retaining contributors• ensuring quality gene models
Lower the entry barrier tocontribution• contributors refine one gene model at a time• present gene model based on user’s e...
Social network• Passive recruitment: post curation activities of  contributors to social network.  • Cathy contributed to ...
Challenges• recruiting contributors• retaining contributors• ensuring quality gene models
Retaining contributorsLearning experienceHelping sciencePrestige & pride:• points and badges.• being featured on our leade...
Challenges• recruiting contributors• retaining contributors• ensuring quality gene models
Begin                           Create initia t asks                    Needs curatio                       n    l        ...
Ensure quality genemodels• make tasks small & simple• beginners are trained• redundant curation• review of conflicts by ex...
Work in progress• gene prediction: MAKER2• gene visualization & editing: Jbrowse (WebApollo)• http://afra.sbcs.qmul.ac.uk•...
Summary• many emerging model organsims are being studied• gene prediction hasn’t caught up yet• manual curation requires h...
http://afra.sbcs.qmul.ac.uk       Thanks          Dr. Yannick Wurm                       Dr. Mitchell E. Skinner          ...
Task
Recruiting alternatives• Force upon students –curriculum (learn / practical)• Pay people
Summing up•       180+ eukaryotic genomes and more coming•       gene prediction hasn’t caught up•       best gene models ...
Biocuration - Crowdsourcing Gene Annotation
Biocuration - Crowdsourcing Gene Annotation
Biocuration - Crowdsourcing Gene Annotation
Biocuration - Crowdsourcing Gene Annotation
Biocuration - Crowdsourcing Gene Annotation
Biocuration - Crowdsourcing Gene Annotation
Biocuration - Crowdsourcing Gene Annotation
Upcoming SlideShare
Loading in …5
×

Biocuration - Crowdsourcing Gene Annotation

611 views
397 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
611
On SlideShare
0
From Embeds
0
Number of Embeds
13
Actions
Shares
0
Downloads
10
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • In the last 10 years the cost of sequencing a genome has gone down from 100M USD to 10K USD.
  • Sequencing genomes is now inexpensive. 10K USD – roughly 4 months of a PhD student’s salary – anyone can sequence an organism and start studying it.
  • Deriving a set of genes from the genome, gene prediction, can be done ab-initio or by looking
  • incorrect translation start Truncated / overextended === incorrect splice donor / acceptor sites
  • Studying gene family evolution -> to determine if gene duplication occurred.RNAseq analyses -> because you can only compare the expression of genes you knowFor analyses, gene models must be inspected and refined.
  • Problems of this scale, difficult for computers but easily approachable with human intelligence, have been successfully crowdsourced.
  • Where the combined time, and effort, and intelligence of people – real people – has been successfully utilized to solve “hard” problems.
  • Hanny’sVoorwerp, discovered in 2007, by Dutch school teacher Hanny van Arkel, while she was participating as an amateur volunteer in the GalaxyZoo project.
  • http://fold.it/portal/info/about#folditpub.At least four publications, in high impact factor journal, leading to the discovery of ….
  • Perhaps. What would the challenges be?
  • We understand it might not be possible to lower the entry barrier so that anybody can start contributing, so we start by “inviting” undergrad Biologists as beta-testers. Biology undergrads would know the underlying Biology, but uninitiated to sequence curation.If somebody asks: one professor can give a few sequence curation tasks as homework assignment.A b
  • Instill a sense of contribution to greater good.Instill curiosity in the scientific oriented kids.Thereby bringing in more contribution.Active recruitement of peers on social network
  • Dr. YannickWurm and I, back at Queen Mary, University of London are trying to build a system to crowdsource manual curation. We have setup a home page at afra.sbcs.qmul.ac.uk to explain what we are doing and will continue to post updates there as we make progress.
  • http://dynamicgene.dnalc.org/annotation/annotation.html
  • Biocuration - Crowdsourcing Gene Annotation

    1. 1. Crowdsourcing Gene Annotation Anurag Priyam
    2. 2. Sequencing cost• Sequencing genomes is now inexpensive.• Many many genomes are now being sequenced.
    3. 3. Gene predictionab initioSearch genome for signsSequence similarity basedSearch genome for known sequences
    4. 4. Gene prediction ischallengingSome examples:• missing exon• truncated or overextended exon• gene split into several gene predictions (e.g. if introns are very large)• merged genes (two adjacent gene models are predicted to be a single “megagene”)
    5. 5. Incorrect Gene Prediction isProblematic• studying gene family evolution• RNAseq analyses• molecular evolution analyses
    6. 6. Manual curation• yields the best gene models• is time consuming• plausible for large communities (e.g. Human, C. elegans)• but what if a small lab sequenced their favorite bug’s genome?
    7. 7. Crowdsourcing
    8. 8. GalaxyZoo
    9. 9. Foldit Foldit
    10. 10. Foldit players contribute to real science!• Christopher B Eiben et al (2012) Increase Diels-Alderase activity through backbone remodelling guided by Foldit players. Nature Biotechnology.• Firas Khatib et al (2011) Crystal structure of a monomeric retroviral protease solved by protein folding game players. Nature Structural and Molecular Biology.• Firas Khatib et al (2011) Algorithm discovery by protein folding game players. Proceeding of the National Academy of Sciences.
    11. 11. Crowdsourcing works• GalaxyZoo volunteers have discovered real galaxies.• Foldit players have guided real scientific questions.
    12. 12. Can we crowdsourcegene model curation?
    13. 13. Challenges• recruiting contributors• retaining contributors• ensuring quality gene models
    14. 14. Lower the entry barrier tocontribution• contributors refine one gene model at a time• present gene model based on user’s experience – beginners see easy to curate models• tutorial or learning tasks• assisting UI
    15. 15. Social network• Passive recruitment: post curation activities of contributors to social network. • Cathy contributed to cancer research by refining three gene models. Can you help too? • Mike helped researchers understand how ant societies are organised by refining two gene models. • Amos earned the “expert gene curator badge” by curating 1000 gene models.• Active recruitment of friends on social network.
    16. 16. Challenges• recruiting contributors• retaining contributors• ensuring quality gene models
    17. 17. Retaining contributorsLearning experienceHelping sciencePrestige & pride:• points and badges.• being featured on our leaderboard.• acknowledgement or coauthorship in publication• responsibility: “senior” contributors are • asked to arbitrate between conflicting submissions of junior contributors. • asked to curate a specific set of genes (developing expertise)
    18. 18. Challenges• recruiting contributors• retaining contributors• ensuring quality gene models
    19. 19. Begin Create initia t asks Needs curatio n l Curate Curate CurateBeing curated Being curated Being curated Submit Submit Submit te t: crea isten task ons iew” Inc rev “ Auto-check Consistent: create next required task Done
    20. 20. Ensure quality genemodels• make tasks small & simple• beginners are trained• redundant curation• review of conflicts by experienced users.
    21. 21. Work in progress• gene prediction: MAKER2• gene visualization & editing: Jbrowse (WebApollo)• http://afra.sbcs.qmul.ac.uk• Our code: Ruby, Sinatra, DataMapper, jQuery
    22. 22. Summary• many emerging model organsims are being studied• gene prediction hasn’t caught up yet• manual curation requires huge amount of time• crowdsourcing exists• crowdsourcing works – even in science• there are many challenges• work in progress
    23. 23. http://afra.sbcs.qmul.ac.uk Thanks Dr. Yannick Wurm Dr. Mitchell E. Skinner Dr. Mark Yandell
    24. 24. Task
    25. 25. Recruiting alternatives• Force upon students –curriculum (learn / practical)• Pay people
    26. 26. Summing up• 180+ eukaryotic genomes and more coming• gene prediction hasn’t caught up• best gene models are manually curated• manual curation can take hours to days • curating a full genome can take years

    ×