SlideShare a Scribd company logo
1 of 33
Crowdsourcing Gene
    Annotation
      Anurag Priyam
Sequencing cost



• Sequencing genomes is now inexpensive.

• Many many genomes are now being sequenced.
Gene prediction




ab initio
Search genome for signs


Sequence similarity based
Search genome for known sequences
Gene prediction is
challenging
Some examples:

• missing exon

• truncated or overextended exon

• gene split into several gene predictions (e.g. if
  introns are very large)

• merged genes (two adjacent gene models are
  predicted to be a single “megagene”)
Incorrect Gene Prediction is
Problematic


• studying gene family evolution

• RNAseq analyses

• molecular evolution analyses
Manual curation


• yields the best gene models

• is time consuming

• plausible for large communities (e.g. Human, C.
  elegans)

• but what if a small lab sequenced their favorite bug’s
  genome?
Crowdsourcing
GalaxyZoo
Foldit




         Foldit
Foldit players contribute to
             real science!
• Christopher B Eiben et al (2012) Increase Diels-Alderase
  activity through backbone remodelling guided by Foldit
  players. Nature Biotechnology.

• Firas Khatib et al (2011) Crystal structure of a monomeric
  retroviral protease solved by protein folding game players.
  Nature Structural and Molecular Biology.

• Firas Khatib et al (2011) Algorithm discovery by protein
  folding game players. Proceeding of the National Academy of
  Sciences.
Crowdsourcing works

• GalaxyZoo volunteers have discovered real
  galaxies.

• Foldit players have guided real scientific
  questions.
Can we crowdsource
gene model curation?
Challenges


• recruiting contributors

• retaining contributors

• ensuring quality gene models
Lower the entry barrier to
contribution

• contributors refine one gene model at a time

• present gene model based on user’s experience –
  beginners see easy to curate models

• tutorial or learning tasks

• assisting UI
Social network

• Passive recruitment: post curation activities of
  contributors to social network.
  • Cathy contributed to cancer research by refining three
    gene models. Can you help too?
  • Mike helped researchers understand how ant societies
    are organised by refining two gene models.
  • Amos earned the “expert gene curator badge” by curating
    1000 gene models.

• Active recruitment of friends on social network.
Challenges


• recruiting contributors

• retaining contributors

• ensuring quality gene models
Retaining contributors
Learning experience
Helping science
Prestige & pride:
• points and badges.
• being featured on our leaderboard.
• acknowledgement or coauthorship in publication
• responsibility: “senior” contributors are
   • asked to arbitrate between conflicting submissions of junior
     contributors.
   • asked to curate a specific set of genes (developing expertise)
Challenges


• recruiting contributors

• retaining contributors

• ensuring quality gene models
Begin

                           Create initia t asks

                    Needs curatio
                       n    l

           Curate          Curate    Curate


Being curated       Being curated         Being curated

       Submit              Submit        Submit
                                                      te
                                               t: crea
                                          isten task
                                       ons iew”
                                    Inc rev
                                        “
                     Auto-check
                                          Consistent:
                                    create next required task

                        Done
Ensure quality gene
models
• make tasks small & simple

• beginners are trained

• redundant curation

• review of conflicts by experienced users.
Work in progress

• gene prediction: MAKER2

• gene visualization & editing: Jbrowse (WebApollo)

• http://afra.sbcs.qmul.ac.uk



• Our code: Ruby, Sinatra, DataMapper, jQuery
Summary

• many emerging model organsims are being studied
• gene prediction hasn’t caught up yet
• manual curation requires huge amount of time
• crowdsourcing exists
• crowdsourcing works – even in science
• there are many challenges
• work in progress
http://afra.sbcs.qmul.ac.uk




       Thanks          Dr. Yannick Wurm
                       Dr. Mitchell E. Skinner
                       Dr. Mark Yandell
Task
Recruiting alternatives

• Force upon students –curriculum (learn / practical)

• Pay people
Summing up

•       180+ eukaryotic genomes and more coming

•       gene prediction hasn’t caught up

•       best gene models are manually curated

•       manual curation can take hours to days
    •    curating a full genome can take years
Biocuration - Crowdsourcing Gene Annotation

More Related Content

Viewers also liked (8)

blast bioinformatics
blast bioinformaticsblast bioinformatics
blast bioinformatics
 
Blast fasta 4
Blast fasta 4Blast fasta 4
Blast fasta 4
 
POLYMERASE CHAIN REACTION
POLYMERASE CHAIN REACTIONPOLYMERASE CHAIN REACTION
POLYMERASE CHAIN REACTION
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformatics
 
PCR, Real Time PCR
PCR, Real Time PCRPCR, Real Time PCR
PCR, Real Time PCR
 
Human genome project
Human genome projectHuman genome project
Human genome project
 
PCR
PCRPCR
PCR
 
PCR and its types
PCR and  its typesPCR and  its types
PCR and its types
 

Similar to Biocuration - Crowdsourcing Gene Annotation

Large Scale Data Mining using Genetics-Based Machine Learning
Large Scale Data Mining using   Genetics-Based Machine LearningLarge Scale Data Mining using   Genetics-Based Machine Learning
Large Scale Data Mining using Genetics-Based Machine Learning
Xavier Llorà
 
Jillian ms defense-4-14-14-ja
Jillian ms defense-4-14-14-jaJillian ms defense-4-14-14-ja
Jillian ms defense-4-14-14-ja
Jillian Aurisano
 
Jillian ms defense-4-14-14-ja-novid2
Jillian ms defense-4-14-14-ja-novid2Jillian ms defense-4-14-14-ja-novid2
Jillian ms defense-4-14-14-ja-novid2
Jillian Aurisano
 
Jillian ms defense-4-14-14-ja-novid2
Jillian ms defense-4-14-14-ja-novid2Jillian ms defense-4-14-14-ja-novid2
Jillian ms defense-4-14-14-ja-novid2
Jillian Aurisano
 
2013 ucar best practices
2013 ucar best practices2013 ucar best practices
2013 ucar best practices
c.titus.brown
 
Jillian ms defense-4-14-14-ja-novideo
Jillian ms defense-4-14-14-ja-novideoJillian ms defense-4-14-14-ja-novideo
Jillian ms defense-4-14-14-ja-novideo
Jillian Aurisano
 
Jillian ms defense-4-14-14-ja-novid3
Jillian ms defense-4-14-14-ja-novid3Jillian ms defense-4-14-14-ja-novid3
Jillian ms defense-4-14-14-ja-novid3
Jillian Aurisano
 

Similar to Biocuration - Crowdsourcing Gene Annotation (20)

yt: Growing and Engaging a Community of Practice
yt: Growing and Engaging a Community of Practiceyt: Growing and Engaging a Community of Practice
yt: Growing and Engaging a Community of Practice
 
Millburn - Flybase community curation
Millburn - Flybase community curationMillburn - Flybase community curation
Millburn - Flybase community curation
 
Large Scale Data Mining using Genetics-Based Machine Learning
Large Scale Data Mining using   Genetics-Based Machine LearningLarge Scale Data Mining using   Genetics-Based Machine Learning
Large Scale Data Mining using Genetics-Based Machine Learning
 
Jillian ms defense-4-14-14-ja
Jillian ms defense-4-14-14-jaJillian ms defense-4-14-14-ja
Jillian ms defense-4-14-14-ja
 
Jillian ms defense-4-14-14-ja-novid2
Jillian ms defense-4-14-14-ja-novid2Jillian ms defense-4-14-14-ja-novid2
Jillian ms defense-4-14-14-ja-novid2
 
Jillian ms defense-4-14-14-ja-novid2
Jillian ms defense-4-14-14-ja-novid2Jillian ms defense-4-14-14-ja-novid2
Jillian ms defense-4-14-14-ja-novid2
 
Crowdsourcing for HCI Research with Amazon Mechanical Turk
Crowdsourcing for HCI Research with Amazon Mechanical TurkCrowdsourcing for HCI Research with Amazon Mechanical Turk
Crowdsourcing for HCI Research with Amazon Mechanical Turk
 
2013 ucar best practices
2013 ucar best practices2013 ucar best practices
2013 ucar best practices
 
Managing Online Business Communities
Managing Online Business CommunitiesManaging Online Business Communities
Managing Online Business Communities
 
OOP in JS
OOP in JSOOP in JS
OOP in JS
 
2013-03-21 What can provenance do for me?
2013-03-21 What can provenance do for me?2013-03-21 What can provenance do for me?
2013-03-21 What can provenance do for me?
 
Working With Legacy Rails Apps - Ahmed Omran
Working With Legacy Rails Apps - Ahmed OmranWorking With Legacy Rails Apps - Ahmed Omran
Working With Legacy Rails Apps - Ahmed Omran
 
Jillian ms defense-4-14-14-ja-novideo
Jillian ms defense-4-14-14-ja-novideoJillian ms defense-4-14-14-ja-novideo
Jillian ms defense-4-14-14-ja-novideo
 
Agile and the evolution
Agile and the evolutionAgile and the evolution
Agile and the evolution
 
Badges at P2PU
Badges at P2PUBadges at P2PU
Badges at P2PU
 
HPCAC - the state of bioinformatics in 2017
HPCAC - the state of bioinformatics in 2017HPCAC - the state of bioinformatics in 2017
HPCAC - the state of bioinformatics in 2017
 
Women's Engineering Society, UK; 11 September 2009
Women's Engineering Society, UK; 11 September 2009Women's Engineering Society, UK; 11 September 2009
Women's Engineering Society, UK; 11 September 2009
 
From Galapagos to Twitter: Darwin, Natural Selection, and Web 2.0
From Galapagos to Twitter: Darwin, Natural Selection, and Web 2.0From Galapagos to Twitter: Darwin, Natural Selection, and Web 2.0
From Galapagos to Twitter: Darwin, Natural Selection, and Web 2.0
 
Jillian ms defense-4-14-14-ja-novid3
Jillian ms defense-4-14-14-ja-novid3Jillian ms defense-4-14-14-ja-novid3
Jillian ms defense-4-14-14-ja-novid3
 
AtlasCamp 2015: Game of Codes: The CI battle
AtlasCamp 2015: Game of Codes: The CI battleAtlasCamp 2015: Game of Codes: The CI battle
AtlasCamp 2015: Game of Codes: The CI battle
 

Recently uploaded

Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
FIDO Alliance
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
FIDO Alliance
 

Recently uploaded (20)

Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptx
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & Ireland
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024
 
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptx
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptxCyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
 

Biocuration - Crowdsourcing Gene Annotation

  • 1. Crowdsourcing Gene Annotation Anurag Priyam
  • 2.
  • 3. Sequencing cost • Sequencing genomes is now inexpensive. • Many many genomes are now being sequenced.
  • 4. Gene prediction ab initio Search genome for signs Sequence similarity based Search genome for known sequences
  • 5. Gene prediction is challenging Some examples: • missing exon • truncated or overextended exon • gene split into several gene predictions (e.g. if introns are very large) • merged genes (two adjacent gene models are predicted to be a single “megagene”)
  • 6. Incorrect Gene Prediction is Problematic • studying gene family evolution • RNAseq analyses • molecular evolution analyses
  • 7. Manual curation • yields the best gene models • is time consuming • plausible for large communities (e.g. Human, C. elegans) • but what if a small lab sequenced their favorite bug’s genome?
  • 8.
  • 9.
  • 11.
  • 13.
  • 14.
  • 15. Foldit Foldit
  • 16. Foldit players contribute to real science! • Christopher B Eiben et al (2012) Increase Diels-Alderase activity through backbone remodelling guided by Foldit players. Nature Biotechnology. • Firas Khatib et al (2011) Crystal structure of a monomeric retroviral protease solved by protein folding game players. Nature Structural and Molecular Biology. • Firas Khatib et al (2011) Algorithm discovery by protein folding game players. Proceeding of the National Academy of Sciences.
  • 17. Crowdsourcing works • GalaxyZoo volunteers have discovered real galaxies. • Foldit players have guided real scientific questions.
  • 18. Can we crowdsource gene model curation?
  • 19. Challenges • recruiting contributors • retaining contributors • ensuring quality gene models
  • 20. Lower the entry barrier to contribution • contributors refine one gene model at a time • present gene model based on user’s experience – beginners see easy to curate models • tutorial or learning tasks • assisting UI
  • 21. Social network • Passive recruitment: post curation activities of contributors to social network. • Cathy contributed to cancer research by refining three gene models. Can you help too? • Mike helped researchers understand how ant societies are organised by refining two gene models. • Amos earned the “expert gene curator badge” by curating 1000 gene models. • Active recruitment of friends on social network.
  • 22. Challenges • recruiting contributors • retaining contributors • ensuring quality gene models
  • 23. Retaining contributors Learning experience Helping science Prestige & pride: • points and badges. • being featured on our leaderboard. • acknowledgement or coauthorship in publication • responsibility: “senior” contributors are • asked to arbitrate between conflicting submissions of junior contributors. • asked to curate a specific set of genes (developing expertise)
  • 24. Challenges • recruiting contributors • retaining contributors • ensuring quality gene models
  • 25. Begin Create initia t asks Needs curatio n l Curate Curate Curate Being curated Being curated Being curated Submit Submit Submit te t: crea isten task ons iew” Inc rev “ Auto-check Consistent: create next required task Done
  • 26. Ensure quality gene models • make tasks small & simple • beginners are trained • redundant curation • review of conflicts by experienced users.
  • 27. Work in progress • gene prediction: MAKER2 • gene visualization & editing: Jbrowse (WebApollo) • http://afra.sbcs.qmul.ac.uk • Our code: Ruby, Sinatra, DataMapper, jQuery
  • 28. Summary • many emerging model organsims are being studied • gene prediction hasn’t caught up yet • manual curation requires huge amount of time • crowdsourcing exists • crowdsourcing works – even in science • there are many challenges • work in progress
  • 29. http://afra.sbcs.qmul.ac.uk Thanks Dr. Yannick Wurm Dr. Mitchell E. Skinner Dr. Mark Yandell
  • 30. Task
  • 31. Recruiting alternatives • Force upon students –curriculum (learn / practical) • Pay people
  • 32. Summing up • 180+ eukaryotic genomes and more coming • gene prediction hasn’t caught up • best gene models are manually curated • manual curation can take hours to days • curating a full genome can take years

Editor's Notes

  1. In the last 10 years the cost of sequencing a genome has gone down from 100M USD to 10K USD.
  2. Sequencing genomes is now inexpensive. 10K USD – roughly 4 months of a PhD student’s salary – anyone can sequence an organism and start studying it.
  3. Deriving a set of genes from the genome, gene prediction, can be done ab-initio or by looking
  4. incorrect translation start Truncated / overextended === incorrect splice donor / acceptor sites
  5. Studying gene family evolution -> to determine if gene duplication occurred.RNAseq analyses -> because you can only compare the expression of genes you knowFor analyses, gene models must be inspected and refined.
  6. Problems of this scale, difficult for computers but easily approachable with human intelligence, have been successfully crowdsourced.
  7. Where the combined time, and effort, and intelligence of people – real people – has been successfully utilized to solve “hard” problems.
  8. Hanny’sVoorwerp, discovered in 2007, by Dutch school teacher Hanny van Arkel, while she was participating as an amateur volunteer in the GalaxyZoo project.
  9. http://fold.it/portal/info/about#folditpub.At least four publications, in high impact factor journal, leading to the discovery of ….
  10. Perhaps. What would the challenges be?
  11. We understand it might not be possible to lower the entry barrier so that anybody can start contributing, so we start by “inviting” undergrad Biologists as beta-testers. Biology undergrads would know the underlying Biology, but uninitiated to sequence curation.If somebody asks: one professor can give a few sequence curation tasks as homework assignment.A b
  12. Instill a sense of contribution to greater good.Instill curiosity in the scientific oriented kids.Thereby bringing in more contribution.Active recruitement of peers on social network
  13. Dr. YannickWurm and I, back at Queen Mary, University of London are trying to build a system to crowdsource manual curation. We have setup a home page at afra.sbcs.qmul.ac.uk to explain what we are doing and will continue to post updates there as we make progress.
  14. http://dynamicgene.dnalc.org/annotation/annotation.html