Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Lock - PomBase community curation


Published on

PomBase Community Curation: A Fast Track to Capture Expert Knowledge, Antonia Lock, Kim Rutherford, Midori Harris, Mark Mcdowall, Paul Kersey, Stephen Oliver, Jurg Bahler and Valerie Wood.

Presented at the 5th International Biocuration Conference, hosted by PIR in Washington, DC, April 2-4, 2012.

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

Lock - PomBase community curation

  1. 1. PomBase Community Curation:A Fast Track to Capture Expert KnowledgeAntonia Lock
  2. 2. The S. pombe Community¡  Medium-sized research community ¡  >200 labs, 1300 subscribe to mailing list ¡  Close-knit¡  GeneDB S. pombe model organism database set up in 2004 ¡  Maintained by one person (V. Wood) ¡  Mainly GO annotation ¡  Problem: ¡  Needed to support additional types of data ¡  Too many publications to curate considering the available man-power
  3. 3. The Community CurationInitiative¡  Pilot study in 2009 ¡  Highly successful ¡  29/44 responded (no follow up for non-responders) ¡  ~360 new annotations ¡  Annotations were generally of high quality – errors easy to spot ¡  Enabled a dialogue between author and curators ¡  Process must be simplified ¡  Need for a simple tool in which to do the curation, instead of a complicated word document¡  2010 – Wellcome Trust grant ¡  to develop and implement a community curation tool ¡  Also to develop a new fission yeast database ‘PomBase’ which will support a range of additional data-types not previously captured in GeneDB
  4. 4. Data captured in GeneDB vs. PomBaseData type Ontology GeneDB PomBaseFunction/Process/Component GO ✔ ✔Protein modifications Protein Modification - ✔ OntologyPhenotypes FYPO (Fission Yeast Some ✔ Phenotype Ontology)Interactions BioGRID BioGRID ✔Gene expression In-house vocabulary - ✔Misc features (disease In-house vocabulary ✔ ✔associations, complementation…) The increased breadth makes community curation even more important
  5. 5. Phenotype Ontology¡  User survey 2007 - Phenotypes were identified as the single most desirable information type not supported by GeneDB S. pombe.¡  Need for a pre-composed Fission Yeast Phenotype Ontology ¡  Ease for community curation ¡  Needed greater specificity of terms than that offered by existing phenotype ontologies¡  Term is accompanied by two types of information: ¡  Allele description – deletion, overexpression of mutation ¡  Experimental conditions where appropriate¡  Combination of different ontologies used to create formal definitions ¡  E.g. PATO, ChEBI, GO PATO FYPO ChEBI resistance to resistance to thiabendazole thiabendazole
  6. 6. GO Term ExtensionsGO  ID   Term   Evidence   With/From   Source  GO:004674   Protein  serine/threonine  kinase  ac<vity   has_substrate  pom1   IDA       Yoon  HJ  et  al.  (2006)   has_substrate  rum1         IDA       Noguchi  E  et  al.  (2002)   has_substrate  rbp80   IDA       Holig  K  et  al.  (2009)   has_substrate  sin1   IDA       Jang  YJ  et  al.  (1997)  
  7. 7. Why Not a Wiki?¡  Traditionally biologists would study one gene/protein ¡  Individual text-based gene pages were an ideal format¡  Many techniques used today generate gene lists ¡  Enrichment identify patterns in the data-set e.g. are certain processes common the group of genes? ¡  Need annotations to controlled vocabularies to make efficient, computerized comparisons ¡  A wiki, essentially free-text, does not provide this¡  All annotations are supported by evidence
  8. 8. What Will theCommunity Curate?¡  Data that can be captured by the formal vocabularies used in PomBase ¡  GO (including extensions) ¡  Protein modifications (including residue information) ¡  Phenotypes (including alleles and conditions) ¡  Interactions¡  Mostly pre-composed terms ¡  Extensions will be captured by prompting where relevant ¡  E.g. the community will not be expected to know when to use these
  9. 9. The Community AnnotationTool - CANTO¡  Final stages of development ¡  Developed by Kim Rutherford ¡  Already in use by the PomBase curators ¡  We are involving the community at this stage through review of curated (recent) publications¡  Provides a web-based interface ¡  Can be used as a stand-alone application (provides annotations in GAFs) ¡  Pipelines are in place for direct loading into Chado ¡  Chado (GMOD project) is a database schema for handling biological data
  10. 10. 5 Easy Steps to BroadCuration of Data- A Walk-through
  11. 11. Step 1: add your genes
  12. 12. The main page- choose a gene to get started…
  13. 13. Step 2: Choose the type ofannotation
  14. 14. Step 3: Find the correct term
  15. 15. Child terms are suggested…
  16. 16. Step 4: Add the evidence
  17. 17. Step 5: Review, extend andtransfer
  18. 18. Quality Control andConsistency Checking¡  Professional curators are needed not just for curation support, but also for quality control and consistency checking.
  19. 19. Help?!¡  There is always a visible help button
  20. 20. Benefits of CommunityCuration¡  Researchers can curate ‘from home’ immediately following publication ¡  First-pass annotations quickly obtained – data will quickly appear in the database ¡  Expert knowledge, coupled to quality control by curators make for powerful, accurate annotations ¡  Controlled annotations can be loaded from the tool directly into our database¡  Bottle-neck is how quickly professional curators can check annotations, not how fast we can obtain them¡  Frees up time for us to clear the back-log of papers
  21. 21. Benefits to the Researcher¡  Greater visibility of publication ¡  Annotations propagated to GO, BioGRID, Ensembl, NCBI, UniProt… ¡  Increased citation index?¡  A greater understanding of ontologies ¡  Will be able to use them better to support their research
  22. 22. Future Directions¡  ~3 months until official launch of CANTO ¡  Multi-gene phenotypes ¡  Extensions (restricted usage for specific terms and relations) ¡  More help features and descriptive boxes¡  Longer term ¡  Making the tool easily configurable for other organisms ¡  Making the tool available to other communities
  23. 23. Acknowledgements¡  The PomBase team: ¡  Val Wood ¡  Midori Harris ¡  Kim Rutherford ¡  Mark McDowall ¡  Antonia Lock¡  PI’s: ¡  Jurg Bahler (UCL) ¡  Steve Oliver (Cambridge) ¡  Paul Kersey (EBI Hinxton)¡  Funded by the Wellcome Trust