The Research Object Initiative:
Frameworks and Use Cases
Professor Carole Goble
The University of Manchester, UK
carole.goble@manchester.ac.uk
NIH BD2K BioCADDIE webinar, 11 June 2015
From Manuscripts to
Research Objects
“An article about computational science in a scientific publication is not the
scholarship itself, it is merely advertising of the scholarship. The actual
scholarship is the complete software development environment, [the
complete data] and the complete set of instructions which generated the
figures.” David Donoho, “Wavelab and Reproducible Research,” 1995
Datasets, Data collections
Standard operating procedures
Software, algorithms
Configurations,
Tools and apps, services
Codes, code libraries
Workflows, scripts
System software
Infrastructure
Compilers, hardware
Scattered Assets
Concept
Drivers for Research Objects (1)
• Computational Workflows /
Scripts
– Multi-step, nested.
– Data, executable codes, services
(remote and local), libraries
– Preservation, Repair
– Reproducibility
• Systems Biology
– Models, data (construction, validation,
predicted), SOPs, samples
– Structured around Investigations,
Studies, Assays
– Exchange
– Reproducibility
Drivers for Research Objects (2)
• ComputationalWorkflows
Commons
– Projects and individuals
– myExperiment.org
• Systems Biology Commons
– Modellers and experimentalists
– Projects and Programs
– Catalogue of research assets
– Fairdomhub.org
– Fair-dom.org
– Seek4science.org
"Mapping present and future predicted distribution patterns for a
meso-grazer guild in the Baltic Sea" by Sonja Leidenberger et al
Workflow Commons
https://doi.org/10.15490/seek.1.investigation.56
[Snoep, 2015]
https://doi.org/10.15490/seek.1.investigation.56
Penkler et al (2015) FEBSJ 282:1481-1511.
https://sems.uni-rostock.de/reproducible-and-citable-data-and-models/
Local
Repositories
LIMS
Public
Repositories
Central repositories
Funding
Agencies
Catalogue
Search
Index
Tools
Research
Infrastructures
execute
companion site
CRIS
results
gateway
catalogue
Standards
metadata
Consumers
Producers
Publishers
haven
platform
Commons
Research Objects
1. Multi-various, citable research products
Research Objects
2. Compound, nested, scattered, yet interconnected
research products, structured investigations
Research Objects
3. Preserved, Portable research products,
inter-platform exchange, reproducibility
Pop-up projects
Dynamic groups
Internal / external visibility
Commons
Research Objects
4. Active research products: evolving. executable.
• Fork.
• Merge.
• Version.
• Cite
• Snapshot.
• Live.
[Martin Scharm]
Haus et al, BMC Systems Biology, 2011, 5:10
Solvent production by Clostridium acetobutylicum
Bigger on the inside than the outside
cite? resolve? steward?
closed
embed
fixed
local
open
alien
refer
fluid
Content
TARDIS Time and Relative Dimension
in Space Scholarship
Multi Span
type
steward
site
author
research
researchers
platforms
time
Contributions
Bigger on the inside than the outside
cite? resolve? steward?
closed
embed
fixed
local
open
alien
refer
fluid
Content
TARDIS Time and Relative Dimension
in Space Scholarship
Multi Span
type
steward
site
author
research
researchers
platforms
time
Contributions
Goble, De Roure, Bechhofer, Accelerating KnowledgeTurns, I3CK, 2013
Knowledge
Turning
interpret
Commons
FAIR
Research
Products
Reproducibility
Interpretation
Comparison
Preservation
Portability
Release
Active
Research
http://ccrtypewriter.blogspot.co.uk/
Research Objectmeans
ends
driver
Framework
Multi-various products, platforms, resources
First class citizens - id, manage, credit, track,
profile, focus
A Framework to Bundle, Port and Link (scattered) resources, related
experiments. Metadata Objects that carry Research Context.
Units of exchange.
Research Objects
http://www.researchobject.org
The Research Object Framework
Desiderata
Technology Independent.
The least possible.
The simplest feasible.
Graceful degradation.
Research Object Framework
Principles & Conventions
API specificationMetadata formats
RO Core
model
using
standards
Annotation
profiles
progressive
extensionsAdobe
UCF
ORE
ODF
OADM/
PROV
Research Object Framework
Principles & Conventions
API specification
Platform Profiles using legacy &
commodity platforms
Metadata formats
Policies Services
Tools
Lifecycle
Steward
Ship
Training
…
Commodity
Native
RO Core
model
using
standards
Annotation
profiles
progressive
extensionsAdobe
UCF
ORE
ODF
OADM/
PROV
Identity
Aggregation
Interpretation:
The objects
How they are
linked together
RO Core Model
manifest
Refer to
aggregations
and their
contents
Describe group
& constituents
External ids
Local files
Attribution:
Who , when,
where, why?
Metadata
Description
RO Core Model
Aggregations
Resource maps
Proxies
Annotation first
class and stand-off
Identity persistence and
resolution, Names
Citation
Identity
Annotation
Aggregation
DOIs
URIs
Handles
ORCID
W3C
OADM
OAI-
ORE
manifest
Point of
extendability
Identity
Annotation
Aggregation
RO Core Platform Profiles
DOIs
URIs
Handles
ORCID
Data Citation
Implementation
OAI-
ORE
W3C
OADM
RO Model Ontology
http://w3id.org/ro/
Defines core concepts of research objects, identity,
aggregation, annotation. Used in the manifest
Metadata Objects
Manifest
The Container Manifest content and the
relationships between the content
• RO metadata- id, title, creator, status….
• Aggregates – list of ids/links to resources
• Annotations – list of annotations about
resources
The Objects
• Remote,
through links
• Locally,
embedded
Manifest – remote and local
on my machine
Container Machinery
Manifest
The Container
Packaging:
Zip files, DOCKER Images…
Catalogues & Commons:
FAIRDOM SEEK, Farr Commons
CKAN, myExperiment…
The Container Manifest
content and the relationships
between the content
Export, archive, publish and transfer ROs.
File format for storage and distribution of
ROs as a ZIP archive
Includes an RO’s manifest, annotations and
some or all of its aggregated resources
Basis for more specific file formats
Backwards compatible: its zip
Programmatic access: JSON and JSON-LD
manifest, API
https://researchobject.github.io/specifications/bundle/
https://w3id.org/bundle/ doi:10.5281/zenodo.10440
https://researchobject.github.io/specifications/bundle/
https://w3id.org/bundle/ doi:10.5281/zenodo.10440
http://www.cnri.reston.va.us/papers/OverviewDigitalObjectArchit
ecture.pdf
RO Lifecycles,
Resolution, Citation
• Defend it (snapshot)
• Locate it (most recent)
• Reuse it (a version, a
component)
• Credit it (contributory
authorship)
• Cross link it (connections)
PURL
Checklists
Versioning
Provenance
Dependencies
Annotation
Profiles
.
Depth: how deeply
described
Coverage: how
much is covered.
Progression levels
Semantic Framework
PID
The Manifest
The Object Metadata
PAV
VoID
VIVO-ISF
PAV
Mim Ontology
Puppet, Makefile
Less detail,
more stakeholders
Checklists
Gamble M, Goble CA, Klyne G, Zhao J
Mim: A minimum information model vocabulary and
framework for scientific linked data IEEE 8th Intl
Conf on eScience pp: 1-8
Zhao J, Klyne G, Gamble M, Goble CA - A Checklist-
Based Approach for Quality Assessment of Scientific
Information Proc Third Linked Science Workshop
2013, co-located ISWC2013.
Library
Publishers
Experiments
Type specific
PID
Citation
NISO-
JATS
Dublin Core
ISA
MIAME
Wf-Desc
Checklist
Annotation
Profiles
.
OBI
SBML,
SED-ML
JERM
EXPO
Wf-prov
Gamble M, Goble CA, Klyne G, Zhao J
Mim: A minimum information model vocabulary
and framework for scientific linked data IEEE 8th
Intl Conf on eScience pp: 1-8
Use Cases
Use case
• SEEK Commons
for Systems
Biology
• Natively RO
• Export/Import
RO bundles
SEEK Metadata framework
link studies and link assets
Describes
common
elements and
relationships
between things
produced and
used in
experiments.
Structured
descriptions for
consistency and
comparison
Just Enough
Results Model
Snapshots
& Living
Living ROs
Snapshot RO of
investigation
and all its parts
Community Sys Bio Models
metadata + packaging
Bergmann, Rodriguez, Le Novère.
COMBINE archive specification.
<http://identifiers.org/combine.specifications/o
mex.version-1> (2014)
Bergman et al COMBINE archive and OMEX
format: one file to share all information to
reproduce a modeling project, BMC
Bioinformatics 2014, 15:369
Combine with RO.
Standardised metadata & API
http://co.mbine.org/documents/archive
https://github.com/stain/ro-combine-archive
doi:10.5281/zenodo.10439
Bridge from Research to FAIR publishing
Deposit
Run
RO Unzip
RO Query
Use Case: Taverna Workflows
Workflow Results
workflowrun.prov.ttl
(RDF)
outputA.txt
outputC.jpg
outputB/
https://w3id.org/bundle
intermediates/
1.txt
2.txt
3.txt
de/def2e58b-50e2-4949-9980-fd310166621a.txt
inputA.txt
workflow
URI
references
attribution
execution
environment
Aggregating in Research Object
ZIP folder structure (RO Bundle)
mimetype
application/vnd.wf4ever.robundle+zip
.ro/manifest.json
Workflow Specification
Example data and
config.
Components.
Plug-ins,Versions
Workflow System
Software package
Workflow Runs
Data and
configs
Provenance
logs
Study
Asset specific Commons
Personal Notebook
Community Registry
General Publishing Repository
Use case: ATLAS Collider
Data Analytics
Portable, lightweight
application runtime
and packaging tool.
Image
ATLAS and CMS detector data
CharlesVardeman,
Da Huo
All data and files
of the execution
+ Instructions
convert
bundle
manifest
Relate files
and layers
Add provenance
and annotations
Link in other
content
Use case:
The Farr Institute Commons
safe use of patient and research
data for medical research
clinical study cohorts
Research Objects:
scripts, data, samples…
different e-Labs, legacy data
http://www.farrinstitute.org/
Use case:
The Farr Institute Commons
The open source data portal software
exchange
catalogue
deposit
Use case:
The Farr Institute Commons
The open source data portal software
exchange
catalogue
deposit
Uses “code as a
research object”
functionality
Baking RO Infrastructure
make, import, export,
inspect, render, version, process, check, …
• Libraries
– Create and inspect RO Bundles and their metadata
– Java, Ruby and Python
• User tools
– RO Manager: command line tool to make ROs
– ROHUB: a prototype web app to manage ROs
• Platforms
– SEEK
– CKAN plug-in to build, import and export ROs
http://www.researchobject.org/specifications/
NIH BD2K + Research Objects
Metadata Profiles
RO Model API
Community IDs*
RO Model Manifest Profile
Implementation Profiles
*BioMedBridges 10 Rules for Identifiers.
Summary
FAIR Research Objects:
• Concept, model, framework, use cases
• Lightweight, Incremental
Challenges
• Multi-stewarding and lifecycles (OAIS)
• Policy, governance
Partnerships
• Figshare, Oxford Bodliean, Farr Institute
• BioCADDIE?
Acknowledgements & Links
Stian Soiland-Reyes
Matt Gamble
Rob Haines
Sean Bechhofer
Norman Morrison
Phil Crouch
Finn Bacall
Stuart Owen
Carole Goble
Khalid Belhajjame
Graham Klyne
Jun Zhao
Daniel Garijo,
Oscar Corcho
Esteban García
Cuesta
University of
Manchester
University of Oxford
Lancaster University
UPM
http://researchobject.org
http://fair-dom.org
http://www.seek4science.org
http://www.farrinstitute.org
http://www.wf4ever-project.org
http://myexperiment.org
Raul Palma
iSOCO
PSNC
Paris 6

The Research Object Initiative: Frameworks and Use Cases