Pascale GaudetChair, International Society for BiocurationScientific Manager, neXtProt, SIB Swiss Institute of BioinformaticsBioDBCore: Current statusand future developments
International Society for Biocuration:Mission statement• Define and promote the work of biocurators• Foster connections with user communities toensure that databases and accompanyingtools meet specific user needs• Promote communication and exchangesbetween curators: meetings, workshops,• Encourage best practices by providingdocumentation on standards and annotationproceduresISB
The need• Databases: improve data integration frompublished papers• Journals: link to databases objects• Researchers: identify resources• Grant submitters: enforce data sharing plans
Goals1) Gather information required to provide ageneral overview of the databaselandscape and compare the variousresources2) Encourage consistency and interoperability3) Promote the use of standards4) Provide guidance for users5) Maximize the collective impact of theresources
BioDBcore group organization• Lead by Pascale Gaudet (ISB/SIB) andPhilippe-Rocca-Serra (BioSharing)• Guidelines proposed in 2011 paper• Implemented in 2012 NAR database issue
Use cases• Show all resources of type database which useMIMARK guidelines• Show all resources where John Smith is involved• Show all resources for mouse phenotypes• Where can I submit my data?and also: • Guidance for grants’ data sharing policies• Improving integration of data from papers intodatabases
Collaborative philosophy• Many groups/resources have been providingregistries and lists of databases• Often not funded, not maintained• BioDBCore seeks to collaborate with all interestedparties to work together to provide a morepermanent solution to database descriptions
BioDBcore: Participating groups² BioDB100² BioSharing² BioCatalogue ² Bioinformatics Links Directory ² Biositemaps ² CASIMIR ² MIBBI² MIRIAM ² Model Organism Databases² NIF registry ² … and your group !
BioDBCore descriptors 1. Database name2. Main resource URL3. Contact information (e-mail; postal mail)4. Date resource established (year)5. Conditions of use (Free, or type of license)6. Scope: data types captured, curation policy,standards used7. Standards: MIs, Data formats, Terminologies8. Taxonomic coverage9. Data accessibility/output options10. Data release frequency11. Versioning policy and access to historical files12. Documentation available13. User support options14. Data submission policy15. Relevant publications16. Resource’s Wikipedia URL17. Tools available
Database name dictyBaseMain resource URL http://dictybase.orgContact information firstname.lastname@example.orgDate resource established (year)2003Conditions of use FreeScope: Data types capturedGenome sequence; gene models including CDS and predicted proteins;Phenotypes,Gene Ontology annotations,Functional annotation (gene product names),Gene nomenclature;Strains; Plasmids;Free text descriptions,Domains (via InterPro), Orthologs (via OrthoMCL and inParanoid), Proteinsubcellular location (via Swiss-Prot); Protein existence (via Swiss-Prot),Citations, Researchers database
Curation policy manual curationStandards: MIs, Data formats, Terminologies Gene Ontology,Dicty Anatomy Ontology, Dicty Gene NomeclatureData formats FASTA, OBO, GAF, GFF3 (standard)Taxonomic coverage (use NCBI Taxid) D. discoideum (44689)including all strains [PRIMARY], also some genome/EST/genemodel info for D. purpureum (5786), and gene model sequencesfor P. pallidum (13642) and D. fasiculatum (261658)Data accessibility/output optionsHTML, text, database reportsData release frequency curators work on the live database,weekly data dumps (sequences) or monthly(other data)Versioning policy/ access to historical files no versioningbut access to historicalfiles is possible
Documentation available http://dictybase.org/FAQ/HelpFilesIndex.htmlUser support options documents, email, webformData submission policy Data from published literature. Some HTPdatacorresponding to published analyses isincorporatedRelevant publications PMID: 18974179, PMID: 14681427Resource’s Wikipedia URLhttp://en.wikipedia.org/wiki/DictyBaseTools available BLAST, BioMart, Generic Genome Browser, TextPresso,MetaCyc (dictyCyc)
Implementation of BioDBCore atBioSharing (Many thanks to Philippe RS !)
BioDBcore announcementPublished in Nucleic Acids Research database issue 2011and in the DATABASE journal
Implementation plan• Goal: BioDBCore data public and linked• Community aware approach: reuse existingstuff• Current Data model: RDF based oncategories from BioSiteMap, MIRIAM, NIF,Dublin core, Darwin Core• Defined extension mechanisms
Creating, editing, maintaining entries• Until now: records are manually created from dataprovided by NAR at publication of Database issueand the Life Sciences Registry (Michel Dumontier andNick Juty)- Those mostly come as xls files that need to bemanually entered- Close to 200 records have been enteredout of over 2,000 obtained
Beyond maintenance at BioSharingIdeally database providers would maintain their BioDBCorerecord up to date• Claim ownership- A database provider can now (in theory) maintain hisown BioDBCore recordEncouraging best practices• DATABASE and Nucleic Acids Research journals:Editors in chief request BioDBCore information fromsubmitters• ISB seal of approval• BioDB100 - launched at InCoB 2011 – examples of 100 wellannotated databases
What’s next ? q Continue to extend participating groups and journalsq Refine scopeq Integrate semantic supportq Develop querying systemq Implement validation testsq Set up mechanisms for exchange of data amongcollaborating groups (in BioDBCore RDF format, orother)
Identifying or developingsemantic support• Policies and guidelines: BioSharing• Publications and taxon info: identifiers.org• Authors: ORCID (will also implementorganizations)• Keywords/database scope: NIF when possibleIdentifying resources is preferable to developing them !
For biohackaton2013q Evaluate need for BioDBCore in today’s landscapeof metadatabase resourcesq Evaluate further collaboration opportunitiesq Set up a better system for creating and maintainingBioDBCore recordsq Identify/develop ontologies pertinent to BioDBCore
AcknowledgementsPhilippe Rocca-SerraSusanna-Assunta SansoneEamonn MaguireAlejandra Gonzalez BeltranInternational Society for BiocurationMichael GalperinDavid LandsmanFrancis OuelletteOXFORD UNIVERSITY PRESS collaborators
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.