A 10 minute presentation given in Denver (CO) on the 15th September as part of the IG Elixir Bridging Force, WG Biosharing Registry,WG Data Type Registries,WG Metadata Standards Catalog joint session of the Research Data Alliance 8th Plenary (part of International Data Week).
This presentation covers the proliferation of data, databases, and data standards in biomedicine, and how BioSharing can help inform and educate users on this landscape and relationships between data, databases and data standards.
The Diversity of Biomedical Data, Databases and Standards (Research Data Alliance (RDA) 8th plenary)
1. The Diversity of Biomedical
Data, Databases and
Standards
Peter McQuilton
BioSharing Content Lead
https://www.biosharing.org
@biosharing
IG Elixir Bridging Force, WG Biosharing Registry,WG Data Type Registries,WG Metadata Standards Catalog
International Data Week, RDA, Denver, 15th September, 2016
2. A growth in data, a growth in
databases, a growth in standards
Number of databases in the NAR database issue, up to
2015 (from @AlexBateman1)
3. • Data/content standards:
• Structure, enrich and report the description of the datasets
and the experimental context under which they were produced
• Facilitate the discovery, sharing, understanding and reuse of
datasets
• ensure all digital research outputs are Findable, Accessible,
Interoperable and Reusable (FAIR)
Data has to be structured for sharing
– we need standards
4. Content standards – enablers
Formats Terminologies Guidelines
Minimum information reporting
requirements, checklists
o Report the same core,
essential information
o e.g. MIAME guidelines
Controlled vocabularies, taxonomies,
thesauri, ontologies etc.
o Use the same word and refer to
the same ‘thing’
o e.g. Gene Ontology
Conceptual model, conceptual
schema, exchange formats etc
o Allow data to flow from one
system to another
o e.g. FASTA
5. de jure de facto
grass-roots
groups
standard
organizations Nanotechnology Working Group
Over 700 content standards in biomedical
sciences
miame
MIAPA
MIRIAM
MIQAS
MIX
MIGEN
ARRIVE
MIAPE
MIASE
MIQE
MISFISHIE….
REMARK
CONSORT
MAGE-Tab
GCDML
SRAxml
SOFT
FASTA
DICOM
MzML
SBRML
SEDML…
GELML
ISA-Tab
CML
MITAB
AAO
CHEBI
OBI
PATO ENVO
MOD
BTO
IDO…
TEDDY
PRO
XAO
DO
VO
Formats Terminologies Guidelines
…….... …….... ……....
6. Technologically-focused
content standards
Biologically-focused content
standards
Even if common features exists, e.g.:
- description of source biomaterial
- experimental design components
these are inconsistently duplicated
Arrays
Scanning
Arrays &
Scanning
Columns
Gels
MS MS
FTIR
NMR
transcriptomics
proteomics
metabolomics
plant biology
epidemiology
microbiology
Diversity in Standards
7. What is BioSharing?
A web-based, curated and searchable portal that monitors the development and
evolution of standards, their use in databases and the adoption of both in data
policies, to inform and educate the user community.
9. Data policies by
funders, journals and
other organizations
(>100)
Database, tools
and services
(>1000)
Content standards
(>700)
Complex and evolving landscape
Formats Terminologies Guidelines
11. NCBI Taxon
~1400 tags
Some hierarchy
Synonyms
4 axes –
- Process
- Material
- Datatype
- Property
What data do we capture?
12. Collections group together
one or more types of
resource by domain,
project or organization.
Recommendations are a
core-set of resources that
are selected and
recommended by a funder
or journal data policy.
Grouping records for different use cases
13.
14.
15.
16. “BioSharing and its interactive browser will allow us to
discover which databases and standards are not currently
included in our author guidelines, enabling us to regularly
monitor and refine our policies as appropriate, in support of
our mission to help our authors enhance the reproducibility
of their work.” – Holly Murray, F1000Research
More data
More interest in accessing/reusing that data
Greater need to structure and store the data
We need to map the landscape
Repositories
Standards
Tricky to integrate data for example medical experts may be interested in microbiology – do they share standards?
Middle: If standards developed with common elements shared across disciplines and some standards should be across technologies (e.g. array)
NOT GOING TO TALK ABOUT FUNCTIONALITY - SEARCHING ETC.
Different stakeholders have different questions
Recommendations based on a 3rd party policy document
Mention emma by name as PLOS data policy manager
This is the educational side