Research Object Composer:
Publishing Complex Data Objects in the FAIRground
Presented by Anita de Waard, VP Research Collaborations
September 24 2019
“One important purpose of the Commons Pilot is to collectively agree on a set of best
practices … to eliminate barriers for accessing, sharing and analyzing biomedical
data.”
Biomedical data moving to the cloud
“Storing, managing, standardizing and publishing the vast amounts of data produced
by biomedical research is a critical mission for the National Institutes of Health.”
2
Findable, Accessible, Interoperable, Reusable Data
Building blocks for the FAIRGround!
Fairly AI-Ready?
Open API!
As a researcher studying genetic
disease X…
I want to
• access 1000s of DNA
sequences of a population, run
analysis Y and
• share results of my findings,
protocol and input data with
my collaborators/ community
• publish an article about it in a
way that data is FAIR along
each step of the process
so that others can reproduce and
build on this work.
Building an open interoperable data ecosystem:
User story:
3
Cloud data is accessible if openly disseminated
Need open data & identifiers for workflow tools:
Requirements :
1. Landing page URL including GUID
2. URL for page where file can be
accessed (downloaded)
3. Metadata for object
4. Reference to the Task (zero or one)
that this dataset was Derived From
5. Reference to the Task(s) that this
dataset is the Source Of
c
4
c
Building an open interoperable data ecosystem:
Aggregates
link things together
Annotations
about things & their
relationships
Container
Packaging content & links:
Zip files, BagIt, Docker images
Identification
locate things
regardless where
5
Building an open interoperable data ecosystem:
database
Open
repository
Workflow Tool
Task 1
Workflow
Input
Task 2
Task 3
Output
Research Object Composer
http://www.researchobject.org
Research Object Profiler
Add annotation and
relationships (metadata)
to collection to describe a
research object:
- URI
- Length
- Filename
- Checksums
etc.
Research Object Serializer
(a manifest itemizing file names)
Serialise Research Object
in standard format based BagIt
=1
=2
=3
RO
1
2
3
Open API
6
Mendeley Data
RO
1
2
3
• DOIs
• Metadata
(Findability)
• Open repo
(Accessibility)
• Versioning
• RO Standard
(Interoperability,
Reusability)
• The RO Composer is not a registry of research objects, but it can list research objects currently under
construction.
• The RO Composer is a microservice which responsibility is to help other services create and deposit
research objects.
• The composer acts as a temporary construction site that can be completed by multiple services (e.g. a
data management system, a workflow system, a user interface).
• These clients will be jointly building a Research Object
that can then be validated according to the schema,
before the RO is downloaded or deposited into an archive
(like Zenodo or Mendeley Data).
• Clients of the RO Composer are applications
(driven by a user interface) or agents (engaged
automatically from other events, e.g. a workflow run).
• The RO Composer is not a required component to this:
any software may generate research objects by following
Research Object specifications.
Purpose of the Research Object Composer*:
7* From: https://github.com/ResearchObject/research-object-composer/blob/master/introduction.ipynb
• API: https://researchobject.github.io/research-object-composer/api/
• Source: https://github.com/ResearchObject/research-object-composer
• Link to Jupyter Notebook tutorial (even I can do it!)
You can drive it today!
8
Use case for the ROC: Earth Sciences!
EVER-EST – RO in Earth Sciences
12 EU partners
4 research communities
Powered by ROHub
9
Other use case: Chemistry! NMReDATA
10
http://nmredata.org/
NMReDATA:
• chemical shifts, scalar couplings, multiplet analysis and
2D cross peaks extracted from a set of NMR spectra
• linked to the assigned chemical structure.
• data resulting from full analysis of organic compounds
and natural products using various spectra.
/ NMR Record
• Database entry or folders including a .sdf file (containing the
chemical structure and the NMReDATA)
• Folders including the relevant NMR spectra (with FID,
acquisition and processing parameters in the manufacturer’s
format).
• In order to facilitate transfers and exchanges of records, the
folder can be compressed in the .zip format.
• The NMR records (and the.sdf file) will be generated by
computer-assisted structure elucidation software or web-
based tools.
RO
1
2
3
Sounds like a
ResearchObject to us…?
Some questions to ponder:
• How to enable interoperability between
ROC and other repositories?
• How do we get the word out there and get
people to use ROs at scale?
• What challenges for wide adoption by
repositories? Authoring tools? Workflow
tools?
• How do RO’s fit in with other initiatives: is
an RO Data, Software, both?
− Citations? Cf Software citation
− Credit? Does it go along with new credit
metrics, Make Data Count, etc?
• What role can publishers play in this?
− Support standards (sit on panels, etc)
− What else??
11
Acknowledgements:
This work was funded by the National Institutes of Health, National Heart,
Lung and Blood Institutes STAGE Project, with Seven Bridges Genomics
inc. Agreement No. 1 OT3 OD025463-01
And performed by:
12
Marina Soares E Silva
Chris Wright
Wouter Haak
Carole Goble
Stian Soyland-Reyes
Finn Bacall

Research Object Composer: A Tool for Publishing Complex Data Objects in the Cloud

  • 1.
    Research Object Composer: PublishingComplex Data Objects in the FAIRground Presented by Anita de Waard, VP Research Collaborations September 24 2019
  • 2.
    “One important purposeof the Commons Pilot is to collectively agree on a set of best practices … to eliminate barriers for accessing, sharing and analyzing biomedical data.” Biomedical data moving to the cloud “Storing, managing, standardizing and publishing the vast amounts of data produced by biomedical research is a critical mission for the National Institutes of Health.” 2 Findable, Accessible, Interoperable, Reusable Data Building blocks for the FAIRGround! Fairly AI-Ready?
  • 3.
    Open API! As aresearcher studying genetic disease X… I want to • access 1000s of DNA sequences of a population, run analysis Y and • share results of my findings, protocol and input data with my collaborators/ community • publish an article about it in a way that data is FAIR along each step of the process so that others can reproduce and build on this work. Building an open interoperable data ecosystem: User story: 3
  • 4.
    Cloud data isaccessible if openly disseminated Need open data & identifiers for workflow tools: Requirements : 1. Landing page URL including GUID 2. URL for page where file can be accessed (downloaded) 3. Metadata for object 4. Reference to the Task (zero or one) that this dataset was Derived From 5. Reference to the Task(s) that this dataset is the Source Of c 4 c
  • 5.
    Building an openinteroperable data ecosystem: Aggregates link things together Annotations about things & their relationships Container Packaging content & links: Zip files, BagIt, Docker images Identification locate things regardless where 5
  • 6.
    Building an openinteroperable data ecosystem: database Open repository Workflow Tool Task 1 Workflow Input Task 2 Task 3 Output Research Object Composer http://www.researchobject.org Research Object Profiler Add annotation and relationships (metadata) to collection to describe a research object: - URI - Length - Filename - Checksums etc. Research Object Serializer (a manifest itemizing file names) Serialise Research Object in standard format based BagIt =1 =2 =3 RO 1 2 3 Open API 6 Mendeley Data RO 1 2 3 • DOIs • Metadata (Findability) • Open repo (Accessibility) • Versioning • RO Standard (Interoperability, Reusability)
  • 7.
    • The ROComposer is not a registry of research objects, but it can list research objects currently under construction. • The RO Composer is a microservice which responsibility is to help other services create and deposit research objects. • The composer acts as a temporary construction site that can be completed by multiple services (e.g. a data management system, a workflow system, a user interface). • These clients will be jointly building a Research Object that can then be validated according to the schema, before the RO is downloaded or deposited into an archive (like Zenodo or Mendeley Data). • Clients of the RO Composer are applications (driven by a user interface) or agents (engaged automatically from other events, e.g. a workflow run). • The RO Composer is not a required component to this: any software may generate research objects by following Research Object specifications. Purpose of the Research Object Composer*: 7* From: https://github.com/ResearchObject/research-object-composer/blob/master/introduction.ipynb
  • 8.
    • API: https://researchobject.github.io/research-object-composer/api/ •Source: https://github.com/ResearchObject/research-object-composer • Link to Jupyter Notebook tutorial (even I can do it!) You can drive it today! 8
  • 9.
    Use case forthe ROC: Earth Sciences! EVER-EST – RO in Earth Sciences 12 EU partners 4 research communities Powered by ROHub 9
  • 10.
    Other use case:Chemistry! NMReDATA 10 http://nmredata.org/ NMReDATA: • chemical shifts, scalar couplings, multiplet analysis and 2D cross peaks extracted from a set of NMR spectra • linked to the assigned chemical structure. • data resulting from full analysis of organic compounds and natural products using various spectra. / NMR Record • Database entry or folders including a .sdf file (containing the chemical structure and the NMReDATA) • Folders including the relevant NMR spectra (with FID, acquisition and processing parameters in the manufacturer’s format). • In order to facilitate transfers and exchanges of records, the folder can be compressed in the .zip format. • The NMR records (and the.sdf file) will be generated by computer-assisted structure elucidation software or web- based tools. RO 1 2 3 Sounds like a ResearchObject to us…?
  • 11.
    Some questions toponder: • How to enable interoperability between ROC and other repositories? • How do we get the word out there and get people to use ROs at scale? • What challenges for wide adoption by repositories? Authoring tools? Workflow tools? • How do RO’s fit in with other initiatives: is an RO Data, Software, both? − Citations? Cf Software citation − Credit? Does it go along with new credit metrics, Make Data Count, etc? • What role can publishers play in this? − Support standards (sit on panels, etc) − What else?? 11
  • 12.
    Acknowledgements: This work wasfunded by the National Institutes of Health, National Heart, Lung and Blood Institutes STAGE Project, with Seven Bridges Genomics inc. Agreement No. 1 OT3 OD025463-01 And performed by: 12 Marina Soares E Silva Chris Wright Wouter Haak Carole Goble Stian Soyland-Reyes Finn Bacall

Editor's Notes

  • #3 Big biomedical data embodies the potential to deliver faster more knowledge about diseases.
  • #4 Collaboration between, among others, data services providers and developers of standards on research objects increases the chance to deliver an interoperable open research data ecosystem which we aim to be sustainable and scalable.
  • #6 Standards-based metadata framework for logically and physically bundling resources with context http://researchobject.org