Susanna-Assunta Sansone,
Associate Director, Oxford e-Research Centre,
University of Oxford, UK
dx.doi.org/10.6084/m9.figshare.4055496.v1
@biosharing
bioCADDIE – DATS and CDEs Workshop, Bethesda, 8 May 2017
Formats Terminologies Guidelines
Common
Data
Elements
Types of content standards
Content standards: descriptors essential for interpretation,
verification, reproducibility, sharing etc. of datasets
Minimum information reporting
requirements, checklists
o Report the same core,
essential information
o e.g. MIAME guidelines
Controlled vocabularies, taxonomies,
thesauri, ontologies etc.
o Unambiguous identification and
definition of concepts
o e.g. Gene Ontology
Conceptual model, schema,
exchange formats etc
o Define the structure and
interrelation of information,
and the transmission format
o e.g. FASTA Formats Terminologies Guidelines
Types of content standards
Common
Data
Elements
de jure de facto
grass-roots
groups
standard
organizations
Nanotechnology Working Group
Formats Terminologies Guidelines
Community-driven efforts, just few examples
Formats Terminologies Guidelines
224
115
500+
source source
source
MIAME
MIRIAM
MIQAS
MIX
MIGEN
ARRIVE
MIAPE
MIASE
MIQE
MISFISHIE….
REMARK
CONSORT
SRAxml
SOFT FASTA
DICOM
MzML
SBRML
SEDML…
GELML
ISA
CML
MITAB
AAO
CHEBIOBI
PATO ENVO
MOD
BTO
IDO…
TEDDY
PRO
XAO
DO
VO
Content standards in numbers
A	web-based,	curated	and	searchable	portal	that monitors	the	development and	
evolution of	standards,	their	use in	databases and	the	adoption	of	both	in	data	
policies,	to	inform and	educate the	user	community
Data policies by
funders, journals and
other organizations
Content standards
Formats Terminologies Guidelines
Map this complex and evolving landscape
Databases
Data policies by
funders, journals and
other organizations
Databases
Content standards
Formats Terminologies Guidelines
Using indicators to describe ‘status’
Ready	for	use,	implementation,	or	recommendation
In	development
Status	uncertain
Deprecated	as	subsumed	or	superseded
All	records	are	manually	curated
in-house	and	verified	by	the	
community	behind	each	resource
Understanding how standards are used
Understanding how standards are used
Guideline
Understanding how standards are used
Formats
Guideline
Understanding how standards are used
Formats
Guideline
Formats
Understanding how standards are used
Formats
Guideline
Formats
Terminology
Technologically-delineated
views of the world
Biologically-delineated
views of the world
Generic features (‘common core’)
- description of source biomaterial
- experimental design components
Arrays
Scanning Arrays &
Scanning
Columns
Gels
MS MS
FTIR
NMR
Columns
transcriptomics
proteomics
metabolomics
plant biology
epidemiology
microbiology
Duplications & lack of interoperability among standards
Arrays
Scanning Arrays &
Scanning
Columns
Gels
MS MS
FTIR
NMR
Columns
transcriptomics
proteomics
metabolomics
plant biology
epidemiology
microbiology
Hard to use them in combinations, e.g. to represent:
Proteomics-based gut microbiota profiling
Proteomics and metabolomics based gut
microbiota profiling
Arrays
Scanning Arrays &
Scanning
Columns
Gels
MS MS
FTIR
NMR
Columns
transcriptomics
proteomics
metabolomics
plant biology
epidemiology
microbiology
Enhancing modularization
Proteomics-based gut microbiota profiling
Proteomics and metabolomics based gut
microbiota profiling
Arrays
Scanning Arrays &
Scanning
Columns
Gels
MS MS
FTIR
NMR
Columns
transcriptomics
proteomics
metabolomics
plant biology
epidemiology
microbiology
Proteomics-based gut microbiota profiling
Proteomics and metabolomics based gut
microbiota profiling
Enhancing modularization
bsg-000174
biosharing:
ReportingGuideline
bsg-000161
MINSEQE
MIMARKS
sample
information
sample
identifier
taxonomy
identifier
sequence
read
geo location
High-level information about
the metadata standards
Representations
of the standards elements
Template elements
for
el-000001
el-000002
el-000003
provenance:
MINSEQE
provenance:
MINSEQE
and
MIMARKS
provenance:
MIMARKS
• Serve machine-readable content metadata standards, providing provenance for their elements
• Inform the creation of metadata templates, rendering standards invisible to the researchers
Modularize and combine
Standard developing groups:Journal, publishers:
Cross-links, data exchange:
Societies and organisations: Institutional RDM services:
Projects, programmes:

BioSharing overview - NIH bioCADDIE workshop on Common Data Elements, 8th May 2017

  • 1.
    Susanna-Assunta Sansone, Associate Director,Oxford e-Research Centre, University of Oxford, UK dx.doi.org/10.6084/m9.figshare.4055496.v1 @biosharing bioCADDIE – DATS and CDEs Workshop, Bethesda, 8 May 2017
  • 2.
    Formats Terminologies Guidelines Common Data Elements Typesof content standards Content standards: descriptors essential for interpretation, verification, reproducibility, sharing etc. of datasets
  • 3.
    Minimum information reporting requirements,checklists o Report the same core, essential information o e.g. MIAME guidelines Controlled vocabularies, taxonomies, thesauri, ontologies etc. o Unambiguous identification and definition of concepts o e.g. Gene Ontology Conceptual model, schema, exchange formats etc o Define the structure and interrelation of information, and the transmission format o e.g. FASTA Formats Terminologies Guidelines Types of content standards Common Data Elements
  • 4.
    de jure defacto grass-roots groups standard organizations Nanotechnology Working Group Formats Terminologies Guidelines Community-driven efforts, just few examples
  • 5.
    Formats Terminologies Guidelines 224 115 500+ sourcesource source MIAME MIRIAM MIQAS MIX MIGEN ARRIVE MIAPE MIASE MIQE MISFISHIE…. REMARK CONSORT SRAxml SOFT FASTA DICOM MzML SBRML SEDML… GELML ISA CML MITAB AAO CHEBIOBI PATO ENVO MOD BTO IDO… TEDDY PRO XAO DO VO Content standards in numbers
  • 8.
    A web-based, curated and searchable portal that monitors the development and evolutionof standards, their use in databases and the adoption of both in data policies, to inform and educate the user community
  • 9.
    Data policies by funders,journals and other organizations Content standards Formats Terminologies Guidelines Map this complex and evolving landscape Databases
  • 10.
    Data policies by funders,journals and other organizations Databases Content standards Formats Terminologies Guidelines Using indicators to describe ‘status’ Ready for use, implementation, or recommendation In development Status uncertain Deprecated as subsumed or superseded All records are manually curated in-house and verified by the community behind each resource
  • 13.
  • 14.
    Understanding how standardsare used Guideline
  • 15.
    Understanding how standardsare used Formats Guideline
  • 16.
    Understanding how standardsare used Formats Guideline Formats
  • 17.
    Understanding how standardsare used Formats Guideline Formats Terminology
  • 18.
    Technologically-delineated views of theworld Biologically-delineated views of the world Generic features (‘common core’) - description of source biomaterial - experimental design components Arrays Scanning Arrays & Scanning Columns Gels MS MS FTIR NMR Columns transcriptomics proteomics metabolomics plant biology epidemiology microbiology Duplications & lack of interoperability among standards
  • 19.
    Arrays Scanning Arrays & Scanning Columns Gels MSMS FTIR NMR Columns transcriptomics proteomics metabolomics plant biology epidemiology microbiology Hard to use them in combinations, e.g. to represent: Proteomics-based gut microbiota profiling Proteomics and metabolomics based gut microbiota profiling
  • 20.
    Arrays Scanning Arrays & Scanning Columns Gels MSMS FTIR NMR Columns transcriptomics proteomics metabolomics plant biology epidemiology microbiology Enhancing modularization Proteomics-based gut microbiota profiling Proteomics and metabolomics based gut microbiota profiling
  • 21.
    Arrays Scanning Arrays & Scanning Columns Gels MSMS FTIR NMR Columns transcriptomics proteomics metabolomics plant biology epidemiology microbiology Proteomics-based gut microbiota profiling Proteomics and metabolomics based gut microbiota profiling Enhancing modularization
  • 22.
    bsg-000174 biosharing: ReportingGuideline bsg-000161 MINSEQE MIMARKS sample information sample identifier taxonomy identifier sequence read geo location High-level informationabout the metadata standards Representations of the standards elements Template elements for el-000001 el-000002 el-000003 provenance: MINSEQE provenance: MINSEQE and MIMARKS provenance: MIMARKS • Serve machine-readable content metadata standards, providing provenance for their elements • Inform the creation of metadata templates, rendering standards invisible to the researchers Modularize and combine
  • 23.
    Standard developing groups:Journal,publishers: Cross-links, data exchange: Societies and organisations: Institutional RDM services: Projects, programmes: