The Research Object Initiative:Frameworks and Use Cases
1. The Research Object Initiative:
Frameworks and Use Cases
Professor Carole Goble
The University of Manchester, UK
carole.goble@manchester.ac.uk
NIH BD2K BioCADDIE webinar, 11 June 2015
2. From Manuscripts to
Research Objects
“An article about computational science in a scientific publication is not the
scholarship itself, it is merely advertising of the scholarship. The actual
scholarship is the complete software development environment, [the
complete data] and the complete set of instructions which generated the
figures.” David Donoho, “Wavelab and Reproducible Research,” 1995
Datasets, Data collections
Standard operating procedures
Software, algorithms
Configurations,
Tools and apps, services
Codes, code libraries
Workflows, scripts
System software
Infrastructure
Compilers, hardware
6. Drivers for Research Objects (1)
• Computational Workflows /
Scripts
– Multi-step, nested.
– Data, executable codes, services
(remote and local), libraries
– Preservation, Repair
– Reproducibility
• Systems Biology
– Models, data (construction, validation,
predicted), SOPs, samples
– Structured around Investigations,
Studies, Assays
– Exchange
– Reproducibility
7. Drivers for Research Objects (2)
• ComputationalWorkflows
Commons
– Projects and individuals
– myExperiment.org
• Systems Biology Commons
– Modellers and experimentalists
– Projects and Programs
– Catalogue of research assets
– Fairdomhub.org
– Fair-dom.org
– Seek4science.org
8. "Mapping present and future predicted distribution patterns for a
meso-grazer guild in the Baltic Sea" by Sonja Leidenberger et al
Workflow Commons
17. Research Objects
3. Preserved, Portable research products,
inter-platform exchange, reproducibility
Pop-up projects
Dynamic groups
Internal / external visibility
Commons
18. Research Objects
4. Active research products: evolving. executable.
• Fork.
• Merge.
• Version.
• Cite
• Snapshot.
• Live.
[Martin Scharm]
Haus et al, BMC Systems Biology, 2011, 5:10
Solvent production by Clostridium acetobutylicum
19. Bigger on the inside than the outside
cite? resolve? steward?
closed
embed
fixed
local
open
alien
refer
fluid
Content
TARDIS Time and Relative Dimension
in Space Scholarship
Multi Span
type
steward
site
author
research
researchers
platforms
time
Contributions
20. Bigger on the inside than the outside
cite? resolve? steward?
closed
embed
fixed
local
open
alien
refer
fluid
Content
TARDIS Time and Relative Dimension
in Space Scholarship
Multi Span
type
steward
site
author
research
researchers
platforms
time
Contributions
21. Goble, De Roure, Bechhofer, Accelerating KnowledgeTurns, I3CK, 2013
Knowledge
Turning
interpret
Commons
FAIR
Research
Products
Reproducibility
Interpretation
Comparison
Preservation
Portability
Release
Active
Research
http://ccrtypewriter.blogspot.co.uk/
Research Objectmeans
ends
driver
23. Multi-various products, platforms, resources
First class citizens - id, manage, credit, track,
profile, focus
A Framework to Bundle, Port and Link (scattered) resources, related
experiments. Metadata Objects that carry Research Context.
Units of exchange.
Research Objects
http://www.researchobject.org
24. The Research Object Framework
Desiderata
Technology Independent.
The least possible.
The simplest feasible.
Graceful degradation.
25. Research Object Framework
Principles & Conventions
API specificationMetadata formats
RO Core
model
using
standards
Annotation
profiles
progressive
extensionsAdobe
UCF
ORE
ODF
OADM/
PROV
26. Research Object Framework
Principles & Conventions
API specification
Platform Profiles using legacy &
commodity platforms
Metadata formats
Policies Services
Tools
Lifecycle
Steward
Ship
Training
…
Commodity
Native
RO Core
model
using
standards
Annotation
profiles
progressive
extensionsAdobe
UCF
ORE
ODF
OADM/
PROV
27. Identity
Aggregation
Interpretation:
The objects
How they are
linked together
RO Core Model
manifest
Refer to
aggregations
and their
contents
Describe group
& constituents
External ids
Local files
Attribution:
Who , when,
where, why?
Metadata
Description
28. RO Core Model
Aggregations
Resource maps
Proxies
Annotation first
class and stand-off
Identity persistence and
resolution, Names
Citation
Identity
Annotation
Aggregation
DOIs
URIs
Handles
ORCID
W3C
OADM
OAI-
ORE
manifest
Point of
extendability
31. Metadata Objects
Manifest
The Container Manifest content and the
relationships between the content
• RO metadata- id, title, creator, status….
• Aggregates – list of ids/links to resources
• Annotations – list of annotations about
resources
The Objects
• Remote,
through links
• Locally,
embedded
34. Export, archive, publish and transfer ROs.
File format for storage and distribution of
ROs as a ZIP archive
Includes an RO’s manifest, annotations and
some or all of its aggregated resources
Basis for more specific file formats
Backwards compatible: its zip
Programmatic access: JSON and JSON-LD
manifest, API
https://researchobject.github.io/specifications/bundle/
https://w3id.org/bundle/ doi:10.5281/zenodo.10440
38. Checklists
Gamble M, Goble CA, Klyne G, Zhao J
Mim: A minimum information model vocabulary and
framework for scientific linked data IEEE 8th Intl
Conf on eScience pp: 1-8
Zhao J, Klyne G, Gamble M, Goble CA - A Checklist-
Based Approach for Quality Assessment of Scientific
Information Proc Third Linked Science Workshop
2013, co-located ISWC2013.
41. Use case
• SEEK Commons
for Systems
Biology
• Natively RO
• Export/Import
RO bundles
42. SEEK Metadata framework
link studies and link assets
Describes
common
elements and
relationships
between things
produced and
used in
experiments.
Structured
descriptions for
consistency and
comparison
Just Enough
Results Model
44. Community Sys Bio Models
metadata + packaging
Bergmann, Rodriguez, Le Novère.
COMBINE archive specification.
<http://identifiers.org/combine.specifications/o
mex.version-1> (2014)
Bergman et al COMBINE archive and OMEX
format: one file to share all information to
reproduce a modeling project, BMC
Bioinformatics 2014, 15:369
Combine with RO.
Standardised metadata & API
http://co.mbine.org/documents/archive
https://github.com/stain/ro-combine-archive
doi:10.5281/zenodo.10439
52. Workflow Specification
Example data and
config.
Components.
Plug-ins,Versions
Workflow System
Software package
Workflow Runs
Data and
configs
Provenance
logs
Study
Asset specific Commons
Personal Notebook
Community Registry
General Publishing Repository
53. Use case: ATLAS Collider
Data Analytics
Portable, lightweight
application runtime
and packaging tool.
Image
ATLAS and CMS detector data
CharlesVardeman,
Da Huo
All data and files
of the execution
+ Instructions
convert
bundle
manifest
Relate files
and layers
Add provenance
and annotations
Link in other
content
54. Use case:
The Farr Institute Commons
safe use of patient and research
data for medical research
clinical study cohorts
Research Objects:
scripts, data, samples…
different e-Labs, legacy data
http://www.farrinstitute.org/
55. Use case:
The Farr Institute Commons
The open source data portal software
exchange
catalogue
deposit
56. Use case:
The Farr Institute Commons
The open source data portal software
exchange
catalogue
deposit
58. Baking RO Infrastructure
make, import, export,
inspect, render, version, process, check, …
• Libraries
– Create and inspect RO Bundles and their metadata
– Java, Ruby and Python
• User tools
– RO Manager: command line tool to make ROs
– ROHUB: a prototype web app to manage ROs
• Platforms
– SEEK
– CKAN plug-in to build, import and export ROs
http://www.researchobject.org/specifications/
59. NIH BD2K + Research Objects
Metadata Profiles
RO Model API
Community IDs*
RO Model Manifest Profile
Implementation Profiles
*BioMedBridges 10 Rules for Identifiers.
60. Summary
FAIR Research Objects:
• Concept, model, framework, use cases
• Lightweight, Incremental
Challenges
• Multi-stewarding and lifecycles (OAIS)
• Policy, governance
Partnerships
• Figshare, Oxford Bodliean, Farr Institute
• BioCADDIE?
61. Acknowledgements & Links
Stian Soiland-Reyes
Matt Gamble
Rob Haines
Sean Bechhofer
Norman Morrison
Phil Crouch
Finn Bacall
Stuart Owen
Carole Goble
Khalid Belhajjame
Graham Klyne
Jun Zhao
Daniel Garijo,
Oscar Corcho
Esteban García
Cuesta
University of
Manchester
University of Oxford
Lancaster University
UPM
http://researchobject.org
http://fair-dom.org
http://www.seek4science.org
http://www.farrinstitute.org
http://www.wf4ever-project.org
http://myexperiment.org
Raul Palma
iSOCO
PSNC
Paris 6