BioDBCore: Current Status and Next Developments

Pascale Gaudet
Chair, International Society for Biocuration
Scientific Manager, neXtProt, SIB Swiss Institute of Bioinformatics
BioDBCore: Current status
and future developments

International Society for Biocuration:
Mission statement
•  Define and promote the work of biocurators
•  Foster connections with user communities to
ensure that databases and accompanying
tools meet specific user needs
•  Promote communication and exchanges
between curators: meetings, workshops,
•  Encourage best practices by providing
documentation on standards and annotation
procedures
ISB

The need
• Databases: improve data integration from
published papers
• Journals: link to databases objects
• Researchers: identify resources
• Grant submitters: enforce data sharing plans

Goals
1)  Gather information required to provide a
general overview of the database
landscape and compare the various
resources
2)  Encourage consistency and interoperability
3)  Promote the use of standards
4)  Provide guidance for users
5)  Maximize the collective impact of the
resources

BioDBcore group organization
•  Lead by Pascale Gaudet (ISB/SIB) and
Philippe-Rocca-Serra (BioSharing)
•  Guidelines proposed in 2011 paper
•  Implemented in 2012 NAR database issue

Use cases
•  Show all resources of type database which use
MIMARK guidelines
•  Show all resources where John Smith is involved
•  Show all resources for mouse phenotypes
•  Where can I submit my data?
and also:
•  Guidance for grants’ data sharing policies
•  Improving integration of data from papers into
databases

Collaborative philosophy
•  Many groups/resources have been providing
registries and lists of databases
•  Often not funded, not maintained
•  BioDBCore seeks to collaborate with all interested
parties to work together to provide a more
permanent solution to database descriptions

BioDBcore: Participating groups
²  BioDB100
²  BioSharing
²  BioCatalogue
²  Bioinformatics Links Directory
²  Biositemaps
²  CASIMIR
²  MIBBI
²  MIRIAM
²  Model Organism Databases
²  NIF registry
²  … and your group !

BioDBCore descriptors

1.  Database name
2.  Main resource URL
3.  Contact information (e-mail; postal mail)
4.  Date resource established (year)
5.  Conditions of use (Free, or type of license)
6.  Scope: data types captured, curation policy,
standards used
7.  Standards: MIs, Data formats, Terminologies
8.  Taxonomic coverage
9.  Data accessibility/output options
10.  Data release frequency
11.  Versioning policy and access to historical files
12.  Documentation available
13.  User support options
14.  Data submission policy
15.  Relevant publications
16.  Resource’s Wikipedia URL
17.  Tools available

Database name dictyBase
Main resource URL http://dictybase.org
Contact information dictybase@northwestern.edu
Date resource established (year)2003
Conditions of use Free
Scope: Data types captured
Genome sequence; gene models including CDS and predicted proteins;
Phenotypes,
Gene Ontology annotations,
Functional annotation (gene product names),
Gene nomenclature;
Strains; Plasmids;
Free text descriptions,
Domains (via InterPro), Orthologs (via OrthoMCL and inParanoid), Protein
subcellular location (via Swiss-Prot); Protein existence (via Swiss-Prot),
Citations, Researchers database

Curation policy manual curation
Standards: MIs, Data formats, Terminologies Gene Ontology,
Dicty Anatomy Ontology, Dicty Gene Nomeclature
Data formats FASTA, OBO, GAF, GFF3 (standard)
Taxonomic coverage (use NCBI Taxid) D. discoideum (44689)
including all strains [PRIMARY], also some genome/EST/gene
model info for D. purpureum (5786), and gene model sequences
for P. pallidum (13642) and D. fasiculatum (261658)
Data accessibility/output optionsHTML, text, database reports
Data release frequency curators work on the 'live' database,
weekly data dumps (sequences) or monthly
(other data)
Versioning policy/ access to historical files no versioning
but access to historical
files is possible

Documentation available http://dictybase.org/FAQ/
HelpFilesIndex.html
User support options documents, email, webform
Data submission policy Data from published literature. Some HTP
data
corresponding to published analyses is
incorporated
Relevant publications PMID: 18974179, PMID: 14681427
Resource’s Wikipedia URL
http://en.wikipedia.org/wiki/DictyBase
Tools available BLAST, BioMart, Generic Genome Browser, TextPresso,
MetaCyc (dictyCyc)

Implementation of BioDBCore at
BioSharing (Many thanks to Philippe RS !)

BioDBcore announcement
Published in Nucleic Acids Research database issue 2011
and in the DATABASE journal

Implementation plan
•  Goal: BioDBCore data public and linked
•  Community aware approach: reuse existing
stuff
•  Current Data model: RDF based on
categories from BioSiteMap, MIRIAM, NIF,
Dublin core, Darwin Core
•  Defined extension mechanisms

Creating, editing, maintaining entries
•  Until now: records are manually created from data
provided by NAR at publication of Database issue
and the Life Sciences Registry (Michel Dumontier and
Nick Juty)
- Those mostly come as xls files that need to be
manually entered
- Close to 200 records have been entered
out of over 2,000 obtained

Beyond maintenance at BioSharing
Ideally database providers would maintain their BioDBCore
record up to date
•  Claim ownership
- A database provider can now (in theory) maintain his
own BioDBCore record
Encouraging best practices
•  DATABASE and Nucleic Acids Research journals:
Editors in chief request BioDBCore information from
submitters
•  ISB seal of approval
•  BioDB100 - launched at InCoB 2011 – examples of 100 well
annotated databases

What’s next ?
q  Continue to extend participating groups and journals
q  Refine scope
q  Integrate semantic support
q  Develop querying system
q  Implement validation tests
q  Set up mechanisms for exchange of data among
collaborating groups (in BioDBCore RDF format, or
other)

Identifying or developing
semantic support
•  Policies and guidelines: BioSharing
•  Publications and taxon info: identifiers.org
•  Authors: ORCID (will also implement
organizations)
•  Keywords/database scope: NIF when possible
Identifying resources is preferable to developing them !

For biohackaton2013
q  Evaluate need for BioDBCore in today’s landscape
of metadatabase resources
q  Evaluate further collaboration opportunities
q  Set up a better system for creating and maintaining
BioDBCore records
q  Identify/develop ontologies pertinent to BioDBCore

Acknowledgements
Philippe Rocca-Serra
Susanna-Assunta Sansone
Eamonn Maguire
Alejandra Gonzalez Beltran
International Society for Biocuration
Michael Galperin
David Landsman
Francis Ouellette
OXFORD
UNIVERSITY
PRESS

collaborators

BioDBCore: Current Status and Next Developments

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to BioDBCore: Current Status and Next Developments

Similar to BioDBCore: Current Status and Next Developments (20)

Recently uploaded

Recently uploaded (20)

BioDBCore: Current Status and Next Developments