BioDBCore: Current Status and Next Developments


Published on

BioDBCore: Current Status and Next Developments
Presented at the Biohackaton 2013, Tokyo, Japan

Published in: Technology, Education
  • Be the first to comment

BioDBCore: Current Status and Next Developments

  1. 1. Pascale GaudetChair, International Society for BiocurationScientific Manager, neXtProt, SIB Swiss Institute of BioinformaticsBioDBCore: Current statusand future developments
  2. 2. International Society for Biocuration:Mission statement•  Define and promote the work of biocurators•  Foster connections with user communities toensure that databases and accompanyingtools meet specific user needs•  Promote communication and exchangesbetween curators: meetings, workshops,•  Encourage best practices by providingdocumentation on standards and annotationproceduresISB
  3. 3. The need• Databases: improve data integration frompublished papers• Journals: link to databases objects• Researchers: identify resources• Grant submitters: enforce data sharing plans
  4. 4. Goals1)  Gather information required to provide ageneral overview of the databaselandscape and compare the variousresources2)  Encourage consistency and interoperability3)  Promote the use of standards4)  Provide guidance for users5)  Maximize the collective impact of theresources
  5. 5. BioDBcore group organization•  Lead by Pascale Gaudet (ISB/SIB) andPhilippe-Rocca-Serra (BioSharing)•  Guidelines proposed in 2011 paper•  Implemented in 2012 NAR database issue
  6. 6. Use cases•  Show all resources of type database which useMIMARK guidelines•  Show all resources where John Smith is involved•  Show all resources for mouse phenotypes•  Where can I submit my data?and also: •  Guidance for grants’ data sharing policies•  Improving integration of data from papers intodatabases
  7. 7. Collaborative philosophy•  Many groups/resources have been providingregistries and lists of databases•  Often not funded, not maintained•  BioDBCore seeks to collaborate with all interestedparties to work together to provide a morepermanent solution to database descriptions
  8. 8. BioDBcore: Participating groups²  BioDB100²  BioSharing²  BioCatalogue ²  Bioinformatics Links Directory ²  Biositemaps ²  CASIMIR ²  MIBBI²  MIRIAM ²  Model Organism Databases²  NIF registry ²  … and your group !
  9. 9. BioDBCore descriptors 1.  Database name2.  Main resource URL3.  Contact information (e-mail; postal mail)4.  Date resource established (year)5.  Conditions of use (Free, or type of license)6.  Scope: data types captured, curation policy,standards used7.  Standards: MIs, Data formats, Terminologies8.  Taxonomic coverage9.  Data accessibility/output options10.  Data release frequency11.  Versioning policy and access to historical files12.  Documentation available13.  User support options14.  Data submission policy15.  Relevant publications16.  Resource’s Wikipedia URL17.  Tools available
  10. 10. Database name dictyBaseMain resource URL http://dictybase.orgContact information dictybase@northwestern.eduDate resource established (year)2003Conditions of use FreeScope: Data types capturedGenome sequence; gene models including CDS and predicted proteins;Phenotypes,Gene Ontology annotations,Functional annotation (gene product names),Gene nomenclature;Strains; Plasmids;Free text descriptions,Domains (via InterPro), Orthologs (via OrthoMCL and inParanoid), Proteinsubcellular location (via Swiss-Prot); Protein existence (via Swiss-Prot),Citations, Researchers database
  11. 11. Curation policy manual curationStandards: MIs, Data formats, Terminologies Gene Ontology,Dicty Anatomy Ontology, Dicty Gene NomeclatureData formats FASTA, OBO, GAF, GFF3 (standard)Taxonomic coverage (use NCBI Taxid) D. discoideum (44689)including all strains [PRIMARY], also some genome/EST/genemodel info for D. purpureum (5786), and gene model sequencesfor P. pallidum (13642) and D. fasiculatum (261658)Data accessibility/output optionsHTML, text, database reportsData release frequency curators work on the live database,weekly data dumps (sequences) or monthly(other data)Versioning policy/ access to historical files no versioningbut access to historicalfiles is possible
  12. 12. Documentation available support options documents, email, webformData submission policy Data from published literature. Some HTPdatacorresponding to published analyses isincorporatedRelevant publications PMID: 18974179, PMID: 14681427Resource’s Wikipedia URL available BLAST, BioMart, Generic Genome Browser, TextPresso,MetaCyc (dictyCyc)
  13. 13. Implementation of BioDBCore atBioSharing (Many thanks to Philippe RS !)
  14. 14. BioDBcore announcementPublished in Nucleic Acids Research database issue 2011and in the DATABASE journal
  15. 15. Implementation plan•  Goal: BioDBCore data public and linked•  Community aware approach: reuse existingstuff•  Current Data model: RDF based oncategories from BioSiteMap, MIRIAM, NIF,Dublin core, Darwin Core•  Defined extension mechanisms
  16. 16.
  17. 17. Example BioDBCore entry (1/2)
  18. 18. Example BioDBCore entry (2/2)
  19. 19. Creating, editing, maintaining entries•  Until now: records are manually created from dataprovided by NAR at publication of Database issueand the Life Sciences Registry (Michel Dumontier andNick Juty)- Those mostly come as xls files that need to bemanually entered- Close to 200 records have been enteredout of over 2,000 obtained
  20. 20. Beyond maintenance at BioSharingIdeally database providers would maintain their BioDBCorerecord up to date•  Claim ownership- A database provider can now (in theory) maintain hisown BioDBCore recordEncouraging best practices•  DATABASE and Nucleic Acids Research journals:Editors in chief request BioDBCore information fromsubmitters•  ISB seal of approval•  BioDB100 - launched at InCoB 2011 – examples of 100 wellannotated databases
  21. 21. What’s next ? q  Continue to extend participating groups and journalsq  Refine scopeq  Integrate semantic supportq  Develop querying systemq  Implement validation testsq  Set up mechanisms for exchange of data amongcollaborating groups (in BioDBCore RDF format, orother)
  22. 22. Identifying or developingsemantic support•  Policies and guidelines: BioSharing•  Publications and taxon info:•  Authors: ORCID (will also implementorganizations)•  Keywords/database scope: NIF when possibleIdentifying resources is preferable to developing them !
  23. 23. For biohackaton2013q  Evaluate need for BioDBCore in today’s landscapeof metadatabase resourcesq  Evaluate further collaboration opportunitiesq  Set up a better system for creating and maintainingBioDBCore recordsq  Identify/develop ontologies pertinent to BioDBCore
  24. 24. AcknowledgementsPhilippe Rocca-SerraSusanna-Assunta SansoneEamonn MaguireAlejandra Gonzalez BeltranInternational Society for BiocurationMichael GalperinDavid LandsmanFrancis OuelletteOXFORD  UNIVERSITY  PRESS  collaborators