SlideShare a Scribd company logo
1 of 62
The swings and roundabouts
of a decade of fun and games
with Research Objects
Professor Carole Goble
The University of Manchester, ELIXIR-UK
Software Sustainability Institute UK
carole.goble@manchester.ac.uk
17th Italian Research Conference on Digital Libraries (IRCDL) 2021, 18th February 2021
I work at supporting
Data Driven
Research at three
scales
Across the research
data lifecycle
Life Sciences,
Biodiversity
social sciences, astronomy,
digital libraries, chemistry etc
EU Research
Infrastructures
Data availability
Methods
Source data
Suppl info
Methods
De novo assembly and binning
Raw reads from each run were first assembled with SPAdes v.3.10.020 with option --meta21.
Thereafter, MetaBAT 215 (v.2.12.1) was used to bin the assemblies using a minimum contig
length threshold of 2,000 bp (option --minContig 2000) and default parameters. Depth of
coverage required for the binning was inferred by mapping the raw reads back to their
assemblies with BWA-MEM v.0.7.1645 and then calculating the corresponding read depths
of each individual contig with samtools v.1.546 (‘samtools view -Sbu’ followed by ‘samtools
sort’) together with the jgi_summarize_bam_contig_depths function from MetaBAT 2. The
QS of each metagenome-assembled genome (MAG) was estimated with CheckM v.1.0.722
using the lineage_wf workflow and calculated as: level of completeness − 5 ×
contamination. Ribosomal RNAs (rRNAs) were detected with the cmsearch function from
INFERNAL v.1.1.247 (options -Z 1000 --hmmonly --cut_ga) using the Rfam48 covariance
models of the bacterial 5S, 16S and 23S rRNAs. Total alignment length was inferred by the
sum of all non-overlapping hits. Each gene was considered present if more than 80% of the
expected sequence length was contained in the MAG. Transfer RNAs (tRNAs) were identified
with tRNAscan-s.e. v.2.049 using the bacterial tRNA model (option -B) and default
parameters. Classification into high- and medium-quality MAGs was based on the criteria
defined by the minimum information about a metagenome-assembled genome (MIMAG)
standards23 (high: >90% completeness and <5% contamination, presence of 5S, 16S and 23S
rRNA genes, and at least 18 tRNAs; medium: ≥ 50% completeness and <10% contamination).
Assignment of MAGs to reference databases
Four reference databases were used to classify the set of MAGs recovered from the human
gut assemblies: HR, RefSeq, GenBank and a collection of MAGs from public datasets. HR
comprised a total of 2,468 high-quality genomes (>90% completeness, <5% contamination)
retrieved from both the HMP catalogue (https://www.hmpdacc.org/catalog/) and the
HGG8. From the RefSeq database, we used all the complete bacterial genomes available (n
= 8,778) as of January 2018. In the case of GenBank, a total of 153,359 bacterial and 4,053
eukaryotic genomes (3,456 fungal and 597 protozoan genomes) deposited as of August
2018 were considered. Lastly, we surveyed 18,227 MAGs from the largest datasets publicly
available as of August 201813,16,17,18,19, including those deposited in the Integrated Microbial
Genomes and Microbiomes (IMG/M) database52. For each database, the function ‘mash
sketch’ from Mash v.2.053 was used to convert the reference genomes into a MinHash
sketch with default k-mer and sketch sizes. Then, the Mash distance between each MAG
and the set of references was calculated with ‘mash dist’ to find the best match (that is, the
reference genome with the lowest Mash distance). Subsequently, each MAG and its closest
relative were aligned with dnadiff v.1.3 from MUMmer 3.2354 to compare each pair of
genomes with regard to the fraction of the MAG aligned (aligned query, AQ) and ANI.
https://doi.org/10.1038/s41586-019-0965-1
scripts, workflows, SOPs & datasets
scripts, workflows, SOPs & datasets
https://doi.org/10.1038/s41586-019-0965-1
Objects are the Outcomes of Research
research outcomes are more than just publications and data
software, models, workflows, SOPs, lab protocols….
all are first class citizens of scholarship
information required to make research
FAIR and Reproducible (FAIR+R) …
From FAIR data to FAIR Digital Objects
To be FAIR
each digital object
type has its own
metadata
requirements,
and may have its
own repositories
and registries
FAIR Digital Objects for Science: From Data Pieces to Actionable
Knowledge Units: https://doi.org/10.3390/publications8020021
From FAIR data to FAIR Digital Objects
The FAIR Guiding Principles for Data Stewardship and Management
Scientific Data 3, 160018 (2016) doi:10.1038/sdata.2016.18
Objects are the Outcomes of Research
Each object has its own metadata and repositories
De-contextualised
Static, Fragmented
Lost Semantic linking
Contextualised
Active, Unified
Semantic linking
Buried in a
PDF figure
Scattered Reporting and Reading
Data Files
Methods: Scripts,
CompWorkflows
URLs to
external
data
Since 2007…
https://myexperiment.org
Systems Biology
Structured, interrelated objects in context
Data files
SOP docs
Models
URLs to
resources
Since 2008…
https://fairdomhub.org
shareable, cite-able, exchangeable resource, with versioning and snapshots
Research Object
general framework - research outcomes related and bundled together
enriching resources and collections with additional
information required to make research FAIR and reproducible
metadata describing content and context
dependencies, versions, relationships, provenance, annotations …
DCAT
Schema.org
EML
MIAPPE
CodeMeta SBML
Bioschemas/CWL DataCite
integrated view over fragmented resources using PIDs
bigger on the inside than the outside
encapsulated content and references to external resources
has its own metadata, can be registered and deposited in its own right,
unpackaged and accessed, activated and reproduced if appropriate
encapsulated content and references to external resources
RO is the Unit of Knowledge Exchange
Between scholarly
communication and
research infrastructure
platforms
->
Between researchers
From the big picture
to the small picture in
practice
standards-based metadata framework for bundling resources
with context into citable reproducible packages
researchobject.org
Linked Data
Bechhofer et al (2013)Why linked data is not enough for scientists https://doi.org/10.1016/j.future.2011.08.004
Bechhofer et al (2010) Research Objects:Towards Exchange and Reuse of Digital Knowledge, https://eprints.soton.ac.uk/268555/
Since 2010…
self-describing, chiefly metadata, objects
RO Metadata file Structured metadata about the RO and content
files
links to web
resources
RO Content
Archive file format / packaging system
BagIt, zip
OCFL, Git
type
id
description
datePublished
…
directories
license
author organisation
self-describing, chiefly metadata, objects
RO Metadata file Structured metadata about the RO and content
image file
links to web
resources
RO Content
Archive file format / packaging system
directory of data
type, id
description
datePublished
creator
size
format …
"
https://zenodo.org/record/3541888
https://github/script
type
id
description
datePublished
…
license
author organisation
“Web Linked Data”
approach
self-describing, chiefly metadata, objects
RO Metadata file Structured metadata about the RO and content
image file
links to web
resources
RO Content
Archive file format / packaging system
directory of data
type, id
description
datePublished
creator
size
format …
"
https://zenodo.org/record/3541888
type
id
description
datePublished
…
license
author organisation
Workflow RO
“Web Linked Data”
approach
self-describing, chiefly metadata, objects
How do we describe the metadata?
PIDs + JSON-LD + Schema.org
descriptors
How can I add additional metadata?
Schema.org, domain ontologies
How do I define a checklist of what is
expected to be in a type of RO?
RO Profile
http://www.researchobject.org/ro-crate/
Standard
Web
Mark-up
RO-Crate in a nutshell
Practical lightweight approach to packaging research data
entities (any object) with metadata
Aggregate files and/or any URI-addressable content, with
contextual information to aid decisions about re-use:Who
WhatWhenWhereWhy How.
Web Native Machine readable. Human readable. Search
engine friendly. Familiar.
Extensible and Incremental: add additional metadata; nested
and typed by their profile.
Open Community effort
38 contributors
fromAU, EU and US
http://www.researchobject.org/ro-crate/
Lifting the Skirts to look at the
underwear
https://w3id.org/ro/crate/1.1
Released 30th October 2020
http://www.researchobject.org/ro-crate/
Community
specific
Combine Archive
Publisher
specific
[Bergmann et al. 2014].
Platform
specific
DataONE
A trend
Examples
Cultural Heritage: A data curation service
for endangered languages: 500,000 files in
28,624 items and 574 collections
long term preservation and
accessibility of research data objects
https://arkisto-platform.github.io/
[Marco La Rosa, Peter Sefton]
Describe the data as
it’s being created
using open standards
and tools
source: https://guides.library.ucsc.edu/datamanagement/
[Marco La Rosa, Peter Sefton]
Scalable verified
collections of references
Processing big genomic & clinical data
distributed over multiple locations
NIH Data Commons
[Chard, et al 2016]
https://doi.org/10.1109/BigData.2016.7840618
minids
Retain and archive processed datasets
Reference and transfer large data on demand
Controlled access to sensitive data
[Kesselman, Foster]
Data and Method Commons
13 EU Life Science Research Infrastructures
Sharing data, tools and workflows in the cloud
100+ data collections
method commons
Data and Method Commons
13 EU Life Science Research Infrastructures
exchanging and preserving
secure data objects
Sharing data, tools and workflows in the cloud
interchange, stewardship,
recording dependencies -> portability &
reuse/reproducibility of workflow objects
https://workflowhub.eu/
Profile
Do Research
Assemble
Methods, Materials
Analyse
Results
Quality
Assessment
Track and Credit
Disseminate
Deposit &
Licence
Publish
Share
Results
Manage
Results
Science 2.0 Repositories: Time for a Change in Scholarly Communication Assante, Candela, Castelli, Manghi, Pagano, D-Lib 2015
Experiment
Observe
Simulate
Describe and release the data , workflows, research
as its being created, updated and used
Getting embedded into the European Open Science Cloud
Collaborative environments and
EGI-ACE data spaces for Earth
Science researchers
RO-Crate as interchange format and
integration with Zenodo, using Describo
Pan-European natively FAIR
and GDPR compliant data
storage and sharing fabric
Science Mesh using Cloud
Services for Synchronization
and Sharing (CS3)
RO to manage research activities
and release snapshots to Zenodo
exchange between
genomics platform and
repository.
standardise and share
analyses generated from
genome sequencing.
Scholarly Communication Drivers
Exchange & Import/Export
Reuse & Reproduce
Report & Archive
Living Objects
Share & Access
Figure: RDMKit, https://rdmkit.elixir-europe.org/
Swings and Roundabouts
why is it called “RO-Crate”??
Activation &
research in a
project
Adoption by
other projects
Adoption
Adoption in
mainstream
infrastructure
Research ObjectVersion 1
2010-2017
http://wf4ever.org/
Sharing of data-intensive computational workflows
Preservation and managing their decay
• documentation of workflows
• interconnections between workflows and related
resources (e.g., datasets, publications, etc.)
• documentation of their dependencies and their outputs
• social aspects
Type-cast  not just for workflows or biology!
Howard Ratner,
Chair STM Future Labs Committee, CEO EVP Nature PublishingGroup
Director of Development for CHORUS
2012
2017
adoption stuck in inner groups
not mainstream or outsiders
a swing back to basics
reboot!
Machine-processable
Standards
Low tech
Graceful degradation
Commodity tooling
Incremental
Multi-platform
Technology Independent
Keep it
simple
E X A M P L E S
Developer
friendliness
A swing back to basics
"just enough complexity / standards”
sufficient extra benefits from what
already exists…
…without compromising the developer
entry-level experience so much that they
rather do their own thing.
Version 1
Linked Data Purity
Construct
metadata
file
Describe
metadata
content
Mismash of
ontologies
combined together
to describe
metadata file
RDF Shapes
Represent profiles
Hard to make
checklists
SHACL, ShEx
W3C Web Annotation Vocabulary
OAI Object Exchange and Reuse
Indeed.
Linked Data is not enough.
And sometimes its too much.
Research Infrastructures:
“digital technologies (hardware, software),
resources (data, services, digital libraries,
standards), comms (protocols, access rights,
networks), people and organisational
structures”
Tensions
Research Infrastructures sit in the middle
Academic
Viewpoint
Infrastructure
Viewpoint
Green field site
Theoretical purity
Use latest thing
Proof of concept
Sophistication
Narrow developer audience
Strive for super generic
The end
Exposing the tech
Pre-existing platforms
Practicality
Use things that work
Production
Simplicity & Familiarity
Wide developer audience
Several specific is ok!
The means
Hiding the tech
2018: Digital Libraries & Developer friendly reboot
Peter Sefton
Semantic Web world vs RealWorld
Open Repositories & Dig Lib Community
DataCrate
simple web
stack
RO
rich RDF
stack
+
fewer features
easier to understand, conceptually simpler
opinionated guide to current best practices
software stacks widely used on the Web
Just Enough Linked Data Just InTime
simplifications rather than generalizations
Retain benefits of Linked Data
querying, graph stores,
vocabularies, clickable URIs
customization and conventions
The stuff a developer needs
documentation, examples, libraries,
tools, community
limited flexibility frees up developers
familiarity is important
Linked Data “exotics” there for when the time is right if needed by the right people
+
Developer friendly ->Tooling, Libraries
How can I use it?
While we’re mostly focusing on the specification, some tools already exist for working with
RO-Crates:
● Describo interactive desktop application to create, update and export RO-Crates for
different profiles. (~ beta)
● CalcyteJS is a command-line tool to help create RO-Crates and HTML-readable
rendering (~ beta)
● ro-crate - JavaScript/NodeJS library for RO-Crate rendering as HTML. (~ beta)
● ro-crate-js - utility to render HTML from RO-Crate (~ alpha)
● ro-crate-ruby Ruby library to consume/produce RO-Crates (~ alpha)
● ro-crate-py Python library to consume/produce RO-Crates (~ planning)
These applications use or expose RO-Crates:
● Workflow Hub imports and exports Workflow RO-Crates
● OCFL-indexer NodeJS application that walks the Oxford Common File Layout on the
file system, validate RO-Crate Metadata Files and parse into objects registered in
Elasticsearch. (~ alpha)
● ONI indexer
● ocfl-tools
● ocfl-viewer
● Research Object Composer is a REST API for gradually building and depositing
Research Objects according to a pre-defined profile. (RO-Crate support alpha)
● … (yours?)
https://uts-eresearch.github.io/describo/
People! A Community
Sponsored
RO2018, Amsterdam
e-Science Conference
https://www.researchobject.org/ro-crate/#contribute
Diverse set of people
Variety of stakeholders
Set of collective norms
Open platform for communication
RO2018
People! Leadership, Champions, Early Adopters
we had to reboot the community twice….
Stian Soiland-Reyes
The University of Manchester, UK
Peter Sefton
University ofTechnology Sydney Infrastructure buy in lifts from project to mainstream
tendency to first focus on
providers…
…but (developer) consumers
build the ecosystem…
Consumers
Archivist and library specialists know the
importance of metadata and standards…
… and for things to work 5, 10, 20 years later.
End-users and Developers handle legacy …
…. their field, repositories, institutions , journals
etc. will always be lagging behind the curve
Researchers, to make their object FAIR …
… do not have the resources or know-how….
Specification &
standards-based
realisation
Incremental &
Extensible, Practical,
Familiar
Built in to their
platforms they use in
their day-to-day work
Problem Community Reference
Examples
Tools
Best Practice Guides
Tutorials
Embedded
Go Round and Round and Round & Maintain Momentum
Back to the Big Picture.
Opportunities
for Scholarly Communications
I concentrated on developers….what about end users?
Big Picture
User-Ware
Familiar tools in
the research
process
Infrastructure
Under-Ware
Familiar tech in
the infrastructure
Registries
Repositories
Tools
Applications
Built in
On ramps
Metadata
automation
I concentrated on developers….what about end users?
Infrastructure
RO-Crate Commons
Ecosystem
Knowledge graphs Citation and tracking
Reproducibility
Release updates
New services
Built by
developers
New user
capabilities
Living and actionable publications
whose content transcends platforms
threaded publications
FAIR Digital
Objects
Back to FAIR Digital Objects
Actionable knowledge unit
Digital butterfly – digital twins
Bags of references
courtesy Dimitris Koureas
Coordinator DiSSCo EU Research
Infrastructure
Specimen object image
courtesy of Alex Hardisty
[Hardisty et al, 2020]
European Open Science Cloud
EOSC Interoperability Framework
Specimen Data Refinery
Workflows to Digitise
Natural History Specimens
FAIR Digital Object
Framework
Open Digital Specimen
Workflow Infrastructure
RO-Crate
+
https://op.europa.eu/en/publication-detail/-/publication/d787ea54-6a87-11eb-aeb5-
01aa75ed71a1/language-en/format-PDF/source-190308283
FAIR Principles for Digital Research Objects
FAIR all the way down
Unbounded FAIR
Distributed FAIR
Living FAIR
Analogous to FAIR Software
FAIR RO-Crate is a practical start
Fig from EOSC Interoperability Framework
Preparing for Digital Objects…..
Digital library community allies!
Make Research Object’s normative
– Promote to researchers but …
– ... target Research Infrastructures to deliver
– Move RO(-Crate)s across the scholarly ecosystem
Developer friendliness matters
– Persuasive design
FAIR principles for Research Objects….
– Unifying the vision with the practical
Release
paradigm
instead of a
Publish one
Barend Mons
Sean Bechhofer
Matthew Gamble
Raul Palma
Jun Zhao
Mark Robinson
AlanWilliams
Norman Morrison
Stian Soiland-Reyes
Tim Clark
Alejandra Gonzalez-Beltran
Philippe Rocca-Serra
Ian Cottam
Susanna Sansone
Kristian Garza
DanielGarijo
Catarina Martins
Iain Buchan
Michael Crusoe
Rob Finn
StuartOwen
Finn Bacall
Bert Droebeke
Laura Rodríguez Navas
Ignacio Eguinoa
Carl Kesselman
Ian Foster
Kyle Chard
Vahan Simonyan
Ravi Madduri
Raja Mazumder
GilAlterovitz
Denis Dean II
DurgaAddepalli
Wouter Haak
Anita DeWaard
Paul Groth
Oscar Corcho
Peter Sefton
Eoghan Ó Carragáin
FrederikCoppens
Jasper Koehorst
Simone Leo
Nick Juty
LJ Garcia Castro
Karl Sebby
Alexander Kanitz
Ana Trisovic
http://researchobject.org
Gavin Kennedy
Mark Graves
José María Fernández Jose
Manuel Gomez-Perez
Jason A. Clark
Salvador Capella-Gutierrez
Alasdair J. G. Gray
Kristi Holmes
Giacomo Tartari
Hervé Ménager
Paul Walk
Brandon Whitehead
Erich Bremer
Mark Wilkinson
Jen Harrow
Marco La Rosa
And many more....

More Related Content

What's hot

What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)Carole Goble
 
Research Object Community Update
Research Object Community UpdateResearch Object Community Update
Research Object Community UpdateCarole Goble
 
FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...Carole Goble
 
Data management, data sharing: the SysMO-SEEK Story
Data management, data sharing: the SysMO-SEEK StoryData management, data sharing: the SysMO-SEEK Story
Data management, data sharing: the SysMO-SEEK StoryCarole Goble
 
ELIXIR UK Node presentation to the ELIXIR Board
ELIXIR UK Node presentation to the ELIXIR BoardELIXIR UK Node presentation to the ELIXIR Board
ELIXIR UK Node presentation to the ELIXIR BoardCarole Goble
 
EOSC-Life Workflow Collaboratory
EOSC-Life Workflow CollaboratoryEOSC-Life Workflow Collaboratory
EOSC-Life Workflow CollaboratoryCarole Goble
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceCarole Goble
 
Reproducibility - The myths and truths of pipeline bioinformatics
Reproducibility - The myths and truths of pipeline bioinformaticsReproducibility - The myths and truths of pipeline bioinformatics
Reproducibility - The myths and truths of pipeline bioinformaticsSimon Cockell
 
Open Science: how to serve the needs of the researcher?
Open Science: how to serve the needs of the researcher? Open Science: how to serve the needs of the researcher?
Open Science: how to serve the needs of the researcher? Carole Goble
 
Introduction to FAIRDOM
Introduction to FAIRDOMIntroduction to FAIRDOM
Introduction to FAIRDOMCarole Goble
 
Let’s go on a FAIR safari!
Let’s go on a FAIR safari!Let’s go on a FAIR safari!
Let’s go on a FAIR safari!Carole Goble
 
Building the FAIR Research Commons: A Data Driven Society of Scientists
Building the FAIR Research Commons: A Data Driven Society of ScientistsBuilding the FAIR Research Commons: A Data Driven Society of Scientists
Building the FAIR Research Commons: A Data Driven Society of ScientistsCarole Goble
 
A Big Picture in Research Data Management
A Big Picture in Research Data ManagementA Big Picture in Research Data Management
A Big Picture in Research Data ManagementCarole Goble
 
Better Software, Better Research
Better Software, Better ResearchBetter Software, Better Research
Better Software, Better ResearchCarole Goble
 
How are we Faring with FAIR? (and what FAIR is not)
How are we Faring with FAIR? (and what FAIR is not)How are we Faring with FAIR? (and what FAIR is not)
How are we Faring with FAIR? (and what FAIR is not)Carole Goble
 
Publishing your research: Research Data Management (Introduction)
Publishing your research: Research Data Management (Introduction) Publishing your research: Research Data Management (Introduction)
Publishing your research: Research Data Management (Introduction) Jamie Bisset
 
Building collaborative workflows for scientific data
Building collaborative workflows for scientific dataBuilding collaborative workflows for scientific data
Building collaborative workflows for scientific dataBruno Vieira
 
Being FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceBeing FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceCarole Goble
 
Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?Paolo Romano
 

What's hot (20)

What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
 
Research Object Community Update
Research Object Community UpdateResearch Object Community Update
Research Object Community Update
 
FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...
 
Data management, data sharing: the SysMO-SEEK Story
Data management, data sharing: the SysMO-SEEK StoryData management, data sharing: the SysMO-SEEK Story
Data management, data sharing: the SysMO-SEEK Story
 
ELIXIR UK Node presentation to the ELIXIR Board
ELIXIR UK Node presentation to the ELIXIR BoardELIXIR UK Node presentation to the ELIXIR Board
ELIXIR UK Node presentation to the ELIXIR Board
 
EOSC-Life Workflow Collaboratory
EOSC-Life Workflow CollaboratoryEOSC-Life Workflow Collaboratory
EOSC-Life Workflow Collaboratory
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practice
 
Reproducibility - The myths and truths of pipeline bioinformatics
Reproducibility - The myths and truths of pipeline bioinformaticsReproducibility - The myths and truths of pipeline bioinformatics
Reproducibility - The myths and truths of pipeline bioinformatics
 
Open Science: how to serve the needs of the researcher?
Open Science: how to serve the needs of the researcher? Open Science: how to serve the needs of the researcher?
Open Science: how to serve the needs of the researcher?
 
Introduction to FAIRDOM
Introduction to FAIRDOMIntroduction to FAIRDOM
Introduction to FAIRDOM
 
Let’s go on a FAIR safari!
Let’s go on a FAIR safari!Let’s go on a FAIR safari!
Let’s go on a FAIR safari!
 
Building the FAIR Research Commons: A Data Driven Society of Scientists
Building the FAIR Research Commons: A Data Driven Society of ScientistsBuilding the FAIR Research Commons: A Data Driven Society of Scientists
Building the FAIR Research Commons: A Data Driven Society of Scientists
 
A Big Picture in Research Data Management
A Big Picture in Research Data ManagementA Big Picture in Research Data Management
A Big Picture in Research Data Management
 
FAIRer Research
FAIRer ResearchFAIRer Research
FAIRer Research
 
Better Software, Better Research
Better Software, Better ResearchBetter Software, Better Research
Better Software, Better Research
 
How are we Faring with FAIR? (and what FAIR is not)
How are we Faring with FAIR? (and what FAIR is not)How are we Faring with FAIR? (and what FAIR is not)
How are we Faring with FAIR? (and what FAIR is not)
 
Publishing your research: Research Data Management (Introduction)
Publishing your research: Research Data Management (Introduction) Publishing your research: Research Data Management (Introduction)
Publishing your research: Research Data Management (Introduction)
 
Building collaborative workflows for scientific data
Building collaborative workflows for scientific dataBuilding collaborative workflows for scientific data
Building collaborative workflows for scientific data
 
Being FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceBeing FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data Science
 
Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?
 

Similar to The swings and roundabouts of a decade of fun and games with Research Objects

RO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsRO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsCarole Goble
 
Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...
Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...
Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...Syed Ahmad Chan Bukhari, PhD
 
Leveraging the CEDAR Workbench for Ontology-linked Submission of Adaptive Imm...
Leveraging the CEDAR Workbench for Ontology-linked Submission of Adaptive Imm...Leveraging the CEDAR Workbench for Ontology-linked Submission of Adaptive Imm...
Leveraging the CEDAR Workbench for Ontology-linked Submission of Adaptive Imm...Ahmad C. Bukhari
 
Next-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information RetrievalNext-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information RetrievalWaqas Tariq
 
IBC FAIR Data Prototype Implementation slideshow
IBC FAIR Data Prototype Implementation   slideshowIBC FAIR Data Prototype Implementation   slideshow
IBC FAIR Data Prototype Implementation slideshowMark Wilkinson
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and KnowledgeIan Foster
 
Global RDF Descriptors for Germplasm Data
Global RDF Descriptors for Germplasm DataGlobal RDF Descriptors for Germplasm Data
Global RDF Descriptors for Germplasm DataVassilis Protonotarios
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Anita de Waard
 
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Stuart Chalk
 
Data dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNLData dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNLAnubhav Jain
 
A consistent and efficient graphical User Interface Design and Querying Organ...
A consistent and efficient graphical User Interface Design and Querying Organ...A consistent and efficient graphical User Interface Design and Querying Organ...
A consistent and efficient graphical User Interface Design and Querying Organ...CSCJournals
 
Dataset Sources Repositories.pptx
Dataset Sources Repositories.pptxDataset Sources Repositories.pptx
Dataset Sources Repositories.pptxmantatheralyasriy
 
Dataset Sources Repositories.pptx
Dataset Sources Repositories.pptxDataset Sources Repositories.pptx
Dataset Sources Repositories.pptxmantatheralyasriy
 
2014 genome informatics Linked Data
2014 genome informatics Linked Data2014 genome informatics Linked Data
2014 genome informatics Linked DataENCODE-DCC
 
Jeff Grethe: CAMERA
Jeff Grethe: CAMERAJeff Grethe: CAMERA
Jeff Grethe: CAMERAIddo
 
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical DataA Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical DataStuart Chalk
 
Data management plans archeology class 10 18 2012
Data management plans archeology class 10 18 2012Data management plans archeology class 10 18 2012
Data management plans archeology class 10 18 2012Elizabeth Brown
 
The beauty of workflows and models
The beauty of workflows and modelsThe beauty of workflows and models
The beauty of workflows and modelsmyGrid team
 
Metadata as Linked Data for Research Data Repositories
Metadata as Linked Data for Research Data RepositoriesMetadata as Linked Data for Research Data Repositories
Metadata as Linked Data for Research Data Repositoriesandrea huang
 

Similar to The swings and roundabouts of a decade of fun and games with Research Objects (20)

RO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsRO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital Objects
 
FAIRy Stories
FAIRy StoriesFAIRy Stories
FAIRy Stories
 
Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...
Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...
Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...
 
Leveraging the CEDAR Workbench for Ontology-linked Submission of Adaptive Imm...
Leveraging the CEDAR Workbench for Ontology-linked Submission of Adaptive Imm...Leveraging the CEDAR Workbench for Ontology-linked Submission of Adaptive Imm...
Leveraging the CEDAR Workbench for Ontology-linked Submission of Adaptive Imm...
 
Next-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information RetrievalNext-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information Retrieval
 
IBC FAIR Data Prototype Implementation slideshow
IBC FAIR Data Prototype Implementation   slideshowIBC FAIR Data Prototype Implementation   slideshow
IBC FAIR Data Prototype Implementation slideshow
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and Knowledge
 
Global RDF Descriptors for Germplasm Data
Global RDF Descriptors for Germplasm DataGlobal RDF Descriptors for Germplasm Data
Global RDF Descriptors for Germplasm Data
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
 
Data dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNLData dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNL
 
A consistent and efficient graphical User Interface Design and Querying Organ...
A consistent and efficient graphical User Interface Design and Querying Organ...A consistent and efficient graphical User Interface Design and Querying Organ...
A consistent and efficient graphical User Interface Design and Querying Organ...
 
Dataset Sources Repositories.pptx
Dataset Sources Repositories.pptxDataset Sources Repositories.pptx
Dataset Sources Repositories.pptx
 
Dataset Sources Repositories.pptx
Dataset Sources Repositories.pptxDataset Sources Repositories.pptx
Dataset Sources Repositories.pptx
 
2014 genome informatics Linked Data
2014 genome informatics Linked Data2014 genome informatics Linked Data
2014 genome informatics Linked Data
 
Jeff Grethe: CAMERA
Jeff Grethe: CAMERAJeff Grethe: CAMERA
Jeff Grethe: CAMERA
 
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical DataA Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
 
Data management plans archeology class 10 18 2012
Data management plans archeology class 10 18 2012Data management plans archeology class 10 18 2012
Data management plans archeology class 10 18 2012
 
The beauty of workflows and models
The beauty of workflows and modelsThe beauty of workflows and models
The beauty of workflows and models
 
Metadata as Linked Data for Research Data Repositories
Metadata as Linked Data for Research Data RepositoriesMetadata as Linked Data for Research Data Repositories
Metadata as Linked Data for Research Data Repositories
 

More from Carole Goble

The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...Carole Goble
 
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science,  a Digital Research...Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science,  a Digital Research...
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...Carole Goble
 
Research Software Sustainability takes a Village
Research Software Sustainability takes a VillageResearch Software Sustainability takes a Village
Research Software Sustainability takes a VillageCarole Goble
 
Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...
Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...
Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...Carole Goble
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational WorkflowsCarole Goble
 
Open Research: Manchester leading and learning
Open Research: Manchester leading and learningOpen Research: Manchester leading and learning
Open Research: Manchester leading and learningCarole Goble
 
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...Carole Goble
 
What is Reproducibility? The R* brouhaha and how Research Objects can help
What is Reproducibility? The R* brouhaha and how Research Objects can helpWhat is Reproducibility? The R* brouhaha and how Research Objects can help
What is Reproducibility? The R* brouhaha and how Research Objects can helpCarole Goble
 
FAIRy stories: tales from building the FAIR Research Commons
FAIRy stories: tales from building the FAIR Research CommonsFAIRy stories: tales from building the FAIR Research Commons
FAIRy stories: tales from building the FAIR Research CommonsCarole Goble
 
Reproducible Research: how could Research Objects help
Reproducible Research: how could Research Objects helpReproducible Research: how could Research Objects help
Reproducible Research: how could Research Objects helpCarole Goble
 
Reflections on a (slightly unusual) multi-disciplinary academic career
Reflections on a (slightly unusual) multi-disciplinary academic careerReflections on a (slightly unusual) multi-disciplinary academic career
Reflections on a (slightly unusual) multi-disciplinary academic careerCarole Goble
 
Reproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsReproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsCarole Goble
 

More from Carole Goble (12)

The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
 
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science,  a Digital Research...Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science,  a Digital Research...
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...
 
Research Software Sustainability takes a Village
Research Software Sustainability takes a VillageResearch Software Sustainability takes a Village
Research Software Sustainability takes a Village
 
Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...
Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...
Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
Open Research: Manchester leading and learning
Open Research: Manchester leading and learningOpen Research: Manchester leading and learning
Open Research: Manchester leading and learning
 
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
 
What is Reproducibility? The R* brouhaha and how Research Objects can help
What is Reproducibility? The R* brouhaha and how Research Objects can helpWhat is Reproducibility? The R* brouhaha and how Research Objects can help
What is Reproducibility? The R* brouhaha and how Research Objects can help
 
FAIRy stories: tales from building the FAIR Research Commons
FAIRy stories: tales from building the FAIR Research CommonsFAIRy stories: tales from building the FAIR Research Commons
FAIRy stories: tales from building the FAIR Research Commons
 
Reproducible Research: how could Research Objects help
Reproducible Research: how could Research Objects helpReproducible Research: how could Research Objects help
Reproducible Research: how could Research Objects help
 
Reflections on a (slightly unusual) multi-disciplinary academic career
Reflections on a (slightly unusual) multi-disciplinary academic careerReflections on a (slightly unusual) multi-disciplinary academic career
Reflections on a (slightly unusual) multi-disciplinary academic career
 
Reproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsReproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trends
 

Recently uploaded

Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyDrAnita Sharma
 

Recently uploaded (20)

Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 

The swings and roundabouts of a decade of fun and games with Research Objects

  • 1. The swings and roundabouts of a decade of fun and games with Research Objects Professor Carole Goble The University of Manchester, ELIXIR-UK Software Sustainability Institute UK carole.goble@manchester.ac.uk 17th Italian Research Conference on Digital Libraries (IRCDL) 2021, 18th February 2021
  • 2. I work at supporting Data Driven Research at three scales Across the research data lifecycle Life Sciences, Biodiversity social sciences, astronomy, digital libraries, chemistry etc EU Research Infrastructures
  • 4. Methods De novo assembly and binning Raw reads from each run were first assembled with SPAdes v.3.10.020 with option --meta21. Thereafter, MetaBAT 215 (v.2.12.1) was used to bin the assemblies using a minimum contig length threshold of 2,000 bp (option --minContig 2000) and default parameters. Depth of coverage required for the binning was inferred by mapping the raw reads back to their assemblies with BWA-MEM v.0.7.1645 and then calculating the corresponding read depths of each individual contig with samtools v.1.546 (‘samtools view -Sbu’ followed by ‘samtools sort’) together with the jgi_summarize_bam_contig_depths function from MetaBAT 2. The QS of each metagenome-assembled genome (MAG) was estimated with CheckM v.1.0.722 using the lineage_wf workflow and calculated as: level of completeness − 5 × contamination. Ribosomal RNAs (rRNAs) were detected with the cmsearch function from INFERNAL v.1.1.247 (options -Z 1000 --hmmonly --cut_ga) using the Rfam48 covariance models of the bacterial 5S, 16S and 23S rRNAs. Total alignment length was inferred by the sum of all non-overlapping hits. Each gene was considered present if more than 80% of the expected sequence length was contained in the MAG. Transfer RNAs (tRNAs) were identified with tRNAscan-s.e. v.2.049 using the bacterial tRNA model (option -B) and default parameters. Classification into high- and medium-quality MAGs was based on the criteria defined by the minimum information about a metagenome-assembled genome (MIMAG) standards23 (high: >90% completeness and <5% contamination, presence of 5S, 16S and 23S rRNA genes, and at least 18 tRNAs; medium: ≥ 50% completeness and <10% contamination). Assignment of MAGs to reference databases Four reference databases were used to classify the set of MAGs recovered from the human gut assemblies: HR, RefSeq, GenBank and a collection of MAGs from public datasets. HR comprised a total of 2,468 high-quality genomes (>90% completeness, <5% contamination) retrieved from both the HMP catalogue (https://www.hmpdacc.org/catalog/) and the HGG8. From the RefSeq database, we used all the complete bacterial genomes available (n = 8,778) as of January 2018. In the case of GenBank, a total of 153,359 bacterial and 4,053 eukaryotic genomes (3,456 fungal and 597 protozoan genomes) deposited as of August 2018 were considered. Lastly, we surveyed 18,227 MAGs from the largest datasets publicly available as of August 201813,16,17,18,19, including those deposited in the Integrated Microbial Genomes and Microbiomes (IMG/M) database52. For each database, the function ‘mash sketch’ from Mash v.2.053 was used to convert the reference genomes into a MinHash sketch with default k-mer and sketch sizes. Then, the Mash distance between each MAG and the set of references was calculated with ‘mash dist’ to find the best match (that is, the reference genome with the lowest Mash distance). Subsequently, each MAG and its closest relative were aligned with dnadiff v.1.3 from MUMmer 3.2354 to compare each pair of genomes with regard to the fraction of the MAG aligned (aligned query, AQ) and ANI. https://doi.org/10.1038/s41586-019-0965-1 scripts, workflows, SOPs & datasets
  • 5. scripts, workflows, SOPs & datasets https://doi.org/10.1038/s41586-019-0965-1
  • 6. Objects are the Outcomes of Research research outcomes are more than just publications and data software, models, workflows, SOPs, lab protocols…. all are first class citizens of scholarship information required to make research FAIR and Reproducible (FAIR+R) …
  • 7. From FAIR data to FAIR Digital Objects To be FAIR each digital object type has its own metadata requirements, and may have its own repositories and registries FAIR Digital Objects for Science: From Data Pieces to Actionable Knowledge Units: https://doi.org/10.3390/publications8020021
  • 8. From FAIR data to FAIR Digital Objects The FAIR Guiding Principles for Data Stewardship and Management Scientific Data 3, 160018 (2016) doi:10.1038/sdata.2016.18
  • 9. Objects are the Outcomes of Research Each object has its own metadata and repositories
  • 10. De-contextualised Static, Fragmented Lost Semantic linking Contextualised Active, Unified Semantic linking Buried in a PDF figure Scattered Reporting and Reading
  • 11. Data Files Methods: Scripts, CompWorkflows URLs to external data Since 2007… https://myexperiment.org
  • 12. Systems Biology Structured, interrelated objects in context Data files SOP docs Models URLs to resources Since 2008… https://fairdomhub.org
  • 13. shareable, cite-able, exchangeable resource, with versioning and snapshots Research Object general framework - research outcomes related and bundled together
  • 14. enriching resources and collections with additional information required to make research FAIR and reproducible metadata describing content and context dependencies, versions, relationships, provenance, annotations … DCAT Schema.org EML MIAPPE CodeMeta SBML Bioschemas/CWL DataCite
  • 15. integrated view over fragmented resources using PIDs bigger on the inside than the outside encapsulated content and references to external resources
  • 16. has its own metadata, can be registered and deposited in its own right, unpackaged and accessed, activated and reproduced if appropriate encapsulated content and references to external resources
  • 17. RO is the Unit of Knowledge Exchange Between scholarly communication and research infrastructure platforms -> Between researchers
  • 18. From the big picture to the small picture in practice
  • 19. standards-based metadata framework for bundling resources with context into citable reproducible packages researchobject.org Linked Data Bechhofer et al (2013)Why linked data is not enough for scientists https://doi.org/10.1016/j.future.2011.08.004 Bechhofer et al (2010) Research Objects:Towards Exchange and Reuse of Digital Knowledge, https://eprints.soton.ac.uk/268555/ Since 2010…
  • 20. self-describing, chiefly metadata, objects RO Metadata file Structured metadata about the RO and content files links to web resources RO Content Archive file format / packaging system BagIt, zip OCFL, Git type id description datePublished … directories license author organisation
  • 21. self-describing, chiefly metadata, objects RO Metadata file Structured metadata about the RO and content image file links to web resources RO Content Archive file format / packaging system directory of data type, id description datePublished creator size format … " https://zenodo.org/record/3541888 https://github/script type id description datePublished … license author organisation “Web Linked Data” approach
  • 22. self-describing, chiefly metadata, objects RO Metadata file Structured metadata about the RO and content image file links to web resources RO Content Archive file format / packaging system directory of data type, id description datePublished creator size format … " https://zenodo.org/record/3541888 type id description datePublished … license author organisation Workflow RO “Web Linked Data” approach
  • 23. self-describing, chiefly metadata, objects How do we describe the metadata? PIDs + JSON-LD + Schema.org descriptors How can I add additional metadata? Schema.org, domain ontologies How do I define a checklist of what is expected to be in a type of RO? RO Profile http://www.researchobject.org/ro-crate/ Standard Web Mark-up
  • 24. RO-Crate in a nutshell Practical lightweight approach to packaging research data entities (any object) with metadata Aggregate files and/or any URI-addressable content, with contextual information to aid decisions about re-use:Who WhatWhenWhereWhy How. Web Native Machine readable. Human readable. Search engine friendly. Familiar. Extensible and Incremental: add additional metadata; nested and typed by their profile. Open Community effort 38 contributors fromAU, EU and US http://www.researchobject.org/ro-crate/
  • 25. Lifting the Skirts to look at the underwear https://w3id.org/ro/crate/1.1 Released 30th October 2020 http://www.researchobject.org/ro-crate/
  • 26. Community specific Combine Archive Publisher specific [Bergmann et al. 2014]. Platform specific DataONE A trend
  • 28. Cultural Heritage: A data curation service for endangered languages: 500,000 files in 28,624 items and 574 collections long term preservation and accessibility of research data objects https://arkisto-platform.github.io/ [Marco La Rosa, Peter Sefton]
  • 29. Describe the data as it’s being created using open standards and tools source: https://guides.library.ucsc.edu/datamanagement/ [Marco La Rosa, Peter Sefton]
  • 30. Scalable verified collections of references Processing big genomic & clinical data distributed over multiple locations NIH Data Commons [Chard, et al 2016] https://doi.org/10.1109/BigData.2016.7840618 minids Retain and archive processed datasets Reference and transfer large data on demand Controlled access to sensitive data [Kesselman, Foster]
  • 31. Data and Method Commons 13 EU Life Science Research Infrastructures Sharing data, tools and workflows in the cloud 100+ data collections method commons
  • 32. Data and Method Commons 13 EU Life Science Research Infrastructures exchanging and preserving secure data objects Sharing data, tools and workflows in the cloud interchange, stewardship, recording dependencies -> portability & reuse/reproducibility of workflow objects
  • 34. Do Research Assemble Methods, Materials Analyse Results Quality Assessment Track and Credit Disseminate Deposit & Licence Publish Share Results Manage Results Science 2.0 Repositories: Time for a Change in Scholarly Communication Assante, Candela, Castelli, Manghi, Pagano, D-Lib 2015 Experiment Observe Simulate Describe and release the data , workflows, research as its being created, updated and used
  • 35. Getting embedded into the European Open Science Cloud Collaborative environments and EGI-ACE data spaces for Earth Science researchers RO-Crate as interchange format and integration with Zenodo, using Describo Pan-European natively FAIR and GDPR compliant data storage and sharing fabric Science Mesh using Cloud Services for Synchronization and Sharing (CS3) RO to manage research activities and release snapshots to Zenodo
  • 36. exchange between genomics platform and repository. standardise and share analyses generated from genome sequencing. Scholarly Communication Drivers Exchange & Import/Export Reuse & Reproduce Report & Archive Living Objects Share & Access Figure: RDMKit, https://rdmkit.elixir-europe.org/
  • 37. Swings and Roundabouts why is it called “RO-Crate”??
  • 38.
  • 39. Activation & research in a project Adoption by other projects Adoption Adoption in mainstream infrastructure
  • 40. Research ObjectVersion 1 2010-2017 http://wf4ever.org/ Sharing of data-intensive computational workflows Preservation and managing their decay • documentation of workflows • interconnections between workflows and related resources (e.g., datasets, publications, etc.) • documentation of their dependencies and their outputs • social aspects Type-cast  not just for workflows or biology!
  • 41. Howard Ratner, Chair STM Future Labs Committee, CEO EVP Nature PublishingGroup Director of Development for CHORUS 2012 2017
  • 42. adoption stuck in inner groups not mainstream or outsiders a swing back to basics reboot!
  • 43. Machine-processable Standards Low tech Graceful degradation Commodity tooling Incremental Multi-platform Technology Independent Keep it simple E X A M P L E S Developer friendliness A swing back to basics "just enough complexity / standards” sufficient extra benefits from what already exists… …without compromising the developer entry-level experience so much that they rather do their own thing.
  • 44. Version 1 Linked Data Purity Construct metadata file Describe metadata content Mismash of ontologies combined together to describe metadata file RDF Shapes Represent profiles Hard to make checklists SHACL, ShEx W3C Web Annotation Vocabulary OAI Object Exchange and Reuse
  • 45. Indeed. Linked Data is not enough. And sometimes its too much. Research Infrastructures: “digital technologies (hardware, software), resources (data, services, digital libraries, standards), comms (protocols, access rights, networks), people and organisational structures”
  • 46. Tensions Research Infrastructures sit in the middle Academic Viewpoint Infrastructure Viewpoint Green field site Theoretical purity Use latest thing Proof of concept Sophistication Narrow developer audience Strive for super generic The end Exposing the tech Pre-existing platforms Practicality Use things that work Production Simplicity & Familiarity Wide developer audience Several specific is ok! The means Hiding the tech
  • 47. 2018: Digital Libraries & Developer friendly reboot Peter Sefton Semantic Web world vs RealWorld Open Repositories & Dig Lib Community DataCrate simple web stack RO rich RDF stack + fewer features easier to understand, conceptually simpler opinionated guide to current best practices software stacks widely used on the Web
  • 48. Just Enough Linked Data Just InTime simplifications rather than generalizations Retain benefits of Linked Data querying, graph stores, vocabularies, clickable URIs customization and conventions The stuff a developer needs documentation, examples, libraries, tools, community limited flexibility frees up developers familiarity is important Linked Data “exotics” there for when the time is right if needed by the right people +
  • 49. Developer friendly ->Tooling, Libraries How can I use it? While we’re mostly focusing on the specification, some tools already exist for working with RO-Crates: ● Describo interactive desktop application to create, update and export RO-Crates for different profiles. (~ beta) ● CalcyteJS is a command-line tool to help create RO-Crates and HTML-readable rendering (~ beta) ● ro-crate - JavaScript/NodeJS library for RO-Crate rendering as HTML. (~ beta) ● ro-crate-js - utility to render HTML from RO-Crate (~ alpha) ● ro-crate-ruby Ruby library to consume/produce RO-Crates (~ alpha) ● ro-crate-py Python library to consume/produce RO-Crates (~ planning) These applications use or expose RO-Crates: ● Workflow Hub imports and exports Workflow RO-Crates ● OCFL-indexer NodeJS application that walks the Oxford Common File Layout on the file system, validate RO-Crate Metadata Files and parse into objects registered in Elasticsearch. (~ alpha) ● ONI indexer ● ocfl-tools ● ocfl-viewer ● Research Object Composer is a REST API for gradually building and depositing Research Objects according to a pre-defined profile. (RO-Crate support alpha) ● … (yours?) https://uts-eresearch.github.io/describo/
  • 50. People! A Community Sponsored RO2018, Amsterdam e-Science Conference https://www.researchobject.org/ro-crate/#contribute Diverse set of people Variety of stakeholders Set of collective norms Open platform for communication RO2018
  • 51. People! Leadership, Champions, Early Adopters we had to reboot the community twice…. Stian Soiland-Reyes The University of Manchester, UK Peter Sefton University ofTechnology Sydney Infrastructure buy in lifts from project to mainstream
  • 52. tendency to first focus on providers… …but (developer) consumers build the ecosystem…
  • 53. Consumers Archivist and library specialists know the importance of metadata and standards… … and for things to work 5, 10, 20 years later. End-users and Developers handle legacy … …. their field, repositories, institutions , journals etc. will always be lagging behind the curve Researchers, to make their object FAIR … … do not have the resources or know-how…. Specification & standards-based realisation Incremental & Extensible, Practical, Familiar Built in to their platforms they use in their day-to-day work
  • 54. Problem Community Reference Examples Tools Best Practice Guides Tutorials Embedded Go Round and Round and Round & Maintain Momentum
  • 55. Back to the Big Picture. Opportunities for Scholarly Communications
  • 56. I concentrated on developers….what about end users? Big Picture User-Ware Familiar tools in the research process Infrastructure Under-Ware Familiar tech in the infrastructure Registries Repositories Tools Applications Built in On ramps Metadata automation
  • 57. I concentrated on developers….what about end users? Infrastructure RO-Crate Commons Ecosystem Knowledge graphs Citation and tracking Reproducibility Release updates New services Built by developers New user capabilities Living and actionable publications whose content transcends platforms threaded publications FAIR Digital Objects
  • 58. Back to FAIR Digital Objects Actionable knowledge unit Digital butterfly – digital twins Bags of references courtesy Dimitris Koureas Coordinator DiSSCo EU Research Infrastructure Specimen object image courtesy of Alex Hardisty [Hardisty et al, 2020]
  • 59. European Open Science Cloud EOSC Interoperability Framework Specimen Data Refinery Workflows to Digitise Natural History Specimens FAIR Digital Object Framework Open Digital Specimen Workflow Infrastructure RO-Crate + https://op.europa.eu/en/publication-detail/-/publication/d787ea54-6a87-11eb-aeb5- 01aa75ed71a1/language-en/format-PDF/source-190308283
  • 60. FAIR Principles for Digital Research Objects FAIR all the way down Unbounded FAIR Distributed FAIR Living FAIR Analogous to FAIR Software FAIR RO-Crate is a practical start Fig from EOSC Interoperability Framework
  • 61. Preparing for Digital Objects….. Digital library community allies! Make Research Object’s normative – Promote to researchers but … – ... target Research Infrastructures to deliver – Move RO(-Crate)s across the scholarly ecosystem Developer friendliness matters – Persuasive design FAIR principles for Research Objects…. – Unifying the vision with the practical Release paradigm instead of a Publish one
  • 62. Barend Mons Sean Bechhofer Matthew Gamble Raul Palma Jun Zhao Mark Robinson AlanWilliams Norman Morrison Stian Soiland-Reyes Tim Clark Alejandra Gonzalez-Beltran Philippe Rocca-Serra Ian Cottam Susanna Sansone Kristian Garza DanielGarijo Catarina Martins Iain Buchan Michael Crusoe Rob Finn StuartOwen Finn Bacall Bert Droebeke Laura Rodríguez Navas Ignacio Eguinoa Carl Kesselman Ian Foster Kyle Chard Vahan Simonyan Ravi Madduri Raja Mazumder GilAlterovitz Denis Dean II DurgaAddepalli Wouter Haak Anita DeWaard Paul Groth Oscar Corcho Peter Sefton Eoghan Ó Carragáin FrederikCoppens Jasper Koehorst Simone Leo Nick Juty LJ Garcia Castro Karl Sebby Alexander Kanitz Ana Trisovic http://researchobject.org Gavin Kennedy Mark Graves José María Fernández Jose Manuel Gomez-Perez Jason A. Clark Salvador Capella-Gutierrez Alasdair J. G. Gray Kristi Holmes Giacomo Tartari Hervé Ménager Paul Walk Brandon Whitehead Erich Bremer Mark Wilkinson Jen Harrow Marco La Rosa And many more....

Editor's Notes

  1. The swings and roundabouts of a decade of fun and games with Research Objects http://ircdl2021.dei.unipd.it/ 17th Italian Research Conference on Digital Libraries Information and Research Science connecting to Digital and Library Science Over the past decade we have been working on the concept of Research Objects http://researchobject.org). We worked at the 20,000 feet level as a new form of FAIR+R scholarly communication and at the 5 feet level as a packaging approach for assembling all the elements of a research investigation, like the workflows, data and tools associated with a publication. Even in our earliest papers [1] we recognised that there was more to success than just using Linked Data. We swing from fancy RDF models and SHACL that specialists could use to simple RO-Crates and JSON-LD that mortal programmers could use. Success depends on a use case need, a community and tools which we found particularly in the world of computational workflows right from the outset. All along there have been politics and that has recently got even hotter. All fun and games! [1] Sean Bechhofer, et al (2013) Why Linked Data is Not Enough for Scientists FGCS 29(2)599-611 https://doi.org/10.1016/j.future.2011.08.004 v
  2. Data pipeline & analysis reporting & reproducibility
  3. https://op.europa.eu/en/publication-detail/-/publication/d787ea54-6a87-11eb-aeb5-01aa75ed71a1/language-en/format-PDF/source-190308283 https://www.eoscsecretariat.eu/sites/default/files/eosc-interoperability-framework-v1.0.pdf Schwardmann (2020), Digital Objects – FAIR Digital Objects: Which Services Are Required? Data Science Journal EOSC Interoperability Framework Draft (2020) Hardisty A, et al (2020) Conceptual design blueprint for the DiSSCo digitization infrastructure RIO 6: e54280. DONA Digital Object Architecture Digital Object Interface Protocol (2018) https://fairdigitalobjectframework.org/
  4. The FAIR Guiding Principles for Data Stewardship and Management Scientific Data 3, 160018 (2016) doi:10.1038/sdata.2016.18
  5. Annotations about these resources, to improve understanding and interpretation Data used and results produced in experimental study​ Methods employed to produce and analyse that data​ Provenance and settings for the experiments​ People, specimens, equipment etc involved in the investigation​
  6. Overcome fragmentation
  7. Overcome fragmentation
  8. Added after presentation
  9. So we came up with ROs
  10. Identifiers to locate things Organisation to structure & link things Annotations about the things @type: MUST be Dataset @id: MUST end with / and SHOULD be the string ./ name: SHOULD identify the dataset to humans well enough to disambiguate it from other RO-Crates description: SHOULD further elaborate on the name to provide a summary of the context in which the dataset is important. datePublished: MUST be a string in ISO 8601 date format and SHOULD be specified to at least the precision of a day, MAY be a timestamp down to the millisecond. license: SHOULD link to a Contextual Entity in the RO-Crate Metadata File with a name and description. MAY have a URI (eg for Creative Commons or Open Source licenses). MAY, if necessary be a textual description of how the RO-Crate may be used. This document specifies a method, known as RO-Crate (Research Object Crate), of organizing file-based data with associated metadata, using linked data principles, in both human and machine readable formats, with the ability to include additional domain-specific metadata. The core of RO-Crate is a JSON-LD file, the RO-Crate Metadata File, named ro-crate-metadata.json. This file contains structured metadata about the dataset as a whole (the Root Data Entity) and, optionally, about some or all of its files. This provides a simple way to, for example, assert the authors (e.g. people, organizations) of the RO-Crate or one its files, or to capture more complex provenance for files, such as how they were created using software and equipment. While providing the formal specification for RO-Crate, this document also aims to be a practical guide for software authors to create tools for generating and consuming research data packages, with explanation by examples. At the basic level, an RO-Crate is a collection of files and resources represented as a Schema.org Dataset, that together form a meaningful unit for the purposes of communication, citation, distribution, preservation, etc. The RO-Crate Metadata File describes the RO-Crate, and MUST be stored in the RO-Crate Root. While RO-Crate is well catered for describing a Dataset as files and relevant metadata that are contained by the RO-Crate in the sense of living within the same root directory, RO-Crates can also reference external resources which are stored or accessed separately, via absolute URIs. This is particularly recommended where some resources cannot be co-hosted for practical or legal reasons, or if the RO-Crate itself is primarily web-based. It is important to note that the RO-Crate Metadata File is not an exhaustive manifest or inventory, that is, it does not necessarily list or describe all files in the package. Rather it is focused on providing sufficient amount of metadata to understand and use the content, and is designed to be compatible with existing and future approaches that do have full inventories / manifest and integrity checks, e.g. by using checksums, such as BagIt and Oxford Common File Layout OCFL Objects. The intention is that RO-Crates can work well with a variety of archive file formats, e.g. tar, zip, etc., and approaches to capturing file manifests and file fixity, such as BagIt, OCFL and git (see also appendix Combining with other packaging schemes). An RO-Crate can also be hosted on the web or mainly refer to web resources, although extra care to ensure persistence and consistency should be taken for archiving such RO-Crates.
  11. Data might @type: MUST be Dataset @id: MUST end with / and SHOULD be the string ./ name: SHOULD identify the dataset to humans well enough to disambiguate it from other RO-Crates description: SHOULD further elaborate on the name to provide a summary of the context in which the dataset is important. datePublished: MUST be a string in ISO 8601 date format and SHOULD be specified to at least the precision of a day, MAY be a timestamp down to the millisecond. license: SHOULD link to a Contextual Entity in the RO-Crate Metadata File with a name and description. MAY have a URI (eg for Creative Commons or Open Source licenses). MAY, if necessary be a textual description of how the RO-Crate may be used. This document specifies a method, known as RO-Crate (Research Object Crate), of organizing file-based data with associated metadata, using linked data principles, in both human and machine readable formats, with the ability to include additional domain-specific metadata. The core of RO-Crate is a JSON-LD file, the RO-Crate Metadata File, named ro-crate-metadata.json. This file contains structured metadata about the dataset as a whole (the Root Data Entity) and, optionally, about some or all of its files. This provides a simple way to, for example, assert the authors (e.g. people, organizations) of the RO-Crate or one its files, or to capture more complex provenance for files, such as how they were created using software and equipment. While providing the formal specification for RO-Crate, this document also aims to be a practical guide for software authors to create tools for generating and consuming research data packages, with explanation by examples. At the basic level, an RO-Crate is a collection of files and resources represented as a Schema.org Dataset, that together form a meaningful unit for the purposes of communication, citation, distribution, preservation, etc. The RO-Crate Metadata File describes the RO-Crate, and MUST be stored in the RO-Crate Root. While RO-Crate is well catered for describing a Dataset as files and relevant metadata that are contained by the RO-Crate in the sense of living within the same root directory, RO-Crates can also reference external resources which are stored or accessed separately, via absolute URIs. This is particularly recommended where some resources cannot be co-hosted for practical or legal reasons, or if the RO-Crate itself is primarily web-based. It is important to note that the RO-Crate Metadata File is not an exhaustive manifest or inventory, that is, it does not necessarily list or describe all files in the package. Rather it is focused on providing sufficient amount of metadata to understand and use the content, and is designed to be compatible with existing and future approaches that do have full inventories / manifest and integrity checks, e.g. by using checksums, such as BagIt and Oxford Common File Layout OCFL Objects. The intention is that RO-Crates can work well with a variety of archive file formats, e.g. tar, zip, etc., and approaches to capturing file manifests and file fixity, such as BagIt, OCFL and git (see also appendix Combining with other packaging schemes). An RO-Crate can also be hosted on the web or mainly refer to web resources, although extra care to ensure persistence and consistency should be taken for archiving such RO-Crates.
  12. Other ROs might be embedded or linked An RO might JUST be metadata and a structured @type: MUST be Dataset @id: MUST end with / and SHOULD be the string ./ name: SHOULD identify the dataset to humans well enough to disambiguate it from other RO-Crates description: SHOULD further elaborate on the name to provide a summary of the context in which the dataset is important. datePublished: MUST be a string in ISO 8601 date format and SHOULD be specified to at least the precision of a day, MAY be a timestamp down to the millisecond. license: SHOULD link to a Contextual Entity in the RO-Crate Metadata File with a name and description. MAY have a URI (eg for Creative Commons or Open Source licenses). MAY, if necessary be a textual description of how the RO-Crate may be used. This document specifies a method, known as RO-Crate (Research Object Crate), of organizing file-based data with associated metadata, using linked data principles, in both human and machine readable formats, with the ability to include additional domain-specific metadata. The core of RO-Crate is a JSON-LD file, the RO-Crate Metadata File, named ro-crate-metadata.json. This file contains structured metadata about the dataset as a whole (the Root Data Entity) and, optionally, about some or all of its files. This provides a simple way to, for example, assert the authors (e.g. people, organizations) of the RO-Crate or one its files, or to capture more complex provenance for files, such as how they were created using software and equipment. While providing the formal specification for RO-Crate, this document also aims to be a practical guide for software authors to create tools for generating and consuming research data packages, with explanation by examples. At the basic level, an RO-Crate is a collection of files and resources represented as a Schema.org Dataset, that together form a meaningful unit for the purposes of communication, citation, distribution, preservation, etc. The RO-Crate Metadata File describes the RO-Crate, and MUST be stored in the RO-Crate Root. While RO-Crate is well catered for describing a Dataset as files and relevant metadata that are contained by the RO-Crate in the sense of living within the same root directory, RO-Crates can also reference external resources which are stored or accessed separately, via absolute URIs. This is particularly recommended where some resources cannot be co-hosted for practical or legal reasons, or if the RO-Crate itself is primarily web-based. It is important to note that the RO-Crate Metadata File is not an exhaustive manifest or inventory, that is, it does not necessarily list or describe all files in the package. Rather it is focused on providing sufficient amount of metadata to understand and use the content, and is designed to be compatible with existing and future approaches that do have full inventories / manifest and integrity checks, e.g. by using checksums, such as BagIt and Oxford Common File Layout OCFL Objects. The intention is that RO-Crates can work well with a variety of archive file formats, e.g. tar, zip, etc., and approaches to capturing file manifests and file fixity, such as BagIt, OCFL and git (see also appendix Combining with other packaging schemes). An RO-Crate can also be hosted on the web or mainly refer to web resources, although extra care to ensure persistence and consistency should be taken for archiving such RO-Crates.
  13. Portland Common Data Model for digital library & repository content Metadata is Flat JSON file Descriptions of the Objects Descriptions of the RO-Crate Descriptions relating the Objects human and machine readable formats, additional domain-specific metadata.
  14. Added after presentation
  15. Released 30th October 2020
  16. We wanted to be platform/community agnostic
  17. PARADISEC and The University of Melbourne University of Technology Sydney PARADISEC (the Pacific And Regional Archive for Digital Sources in Endangered Cultures) has operated a data curation service for endangered languages for 18 years. Currently consisting of 500,000 files in 28,624 items and 574 collections, there is over 90TB of data under management. The repository, whilst still functional and fit for purpose, is showing signs of its age. Using the Arkisto Platform, the Modern PARADISEC project is updating the catalog whilst also enabling a path for smaller regional repositories to exist and to easily move their content across collections and institutions.
  18. For research data to be Findable, and Interoperable and Reusable, it has to have metadata that makes it so - and this needs to be both human and machine readable so that finding aids can be built and people can search, or write programs to discover and use data. Also for research data to be distributed it has to be *packaged* into distributable objects - otherwise there is no digital object. This talk looks at one tool that implements the Research Object Crate standard that supports such FAIR packaging.
  19. NIH Data Commons transferring and archiving very large HTS datasets in a location-independent way keep the context of data content together when its scattered. Scalability
  20. The Data-Driven Open Science, e-Laboratories ROs all the way along, not just the end Inside and outside Elsewhere, on-date Within, during Reproducible Research Environment Integrated infrastructure for producing and working with reproducible research. Reproducible Research Publication Environment Distributing and reviewing; academic credit; legal licensing etc. Such a merge between research infrastructure and research marketplace would overcome known "cultural barriers" and the "methodological barriers" mentioned above. In the RI scope, research products publishing would: be facilitated by the very ICT services that are generating them; take advantage of research activity awareness and product interlinking; support access rights issues that fit the need of the community; and subsume the major costs of storing and curating products. In other words, by bringing marketplace features within the RI, researchers can finally achieve their best effort in terms of scholarly communication. Science 2.0 Repositories: Time for a Change in Scholarly Communication Massimiliano Assante, Leonardo Candela, Donatella Castelli, Paolo Manghi and Pasquale Pagano Istituto di Scienza e Tecnologie dell'Informazione, Consiglio Nazionale delle Ricerche, Italy {assante, candela, castelli, manghi, pagano}@isti.cnr.it DOI: 10.1045/january2015-assante
  21. Data Cubes for efficient and scalable structured data access and discovery Research Objects as the mechanism to manage scientific research activities and connect associated resources RO snapshot/releases published to EOSC repositories and scholarly communication platforms (e.g., B2Share, Zenodo
  22. Journey
  23. Journey
  24. Howard Ratner, Chair STM Future Labs Committee, CEO EVP Nature Publishing Group Director of Development for CHORUS (Clearinghouse for the Open Research of US) STM Innovations Seminar 2012 projects
  25. https://www.researchobject.org/specs/ W3C Web Annotation Vocabulary OAI Object Exchange and Reuse Describe manifest content Wf4Ever RO ontology, Wf4Ever ROEvo … Dublin Core, FOAF, SIOC, Creative Commons, PROV, PAV… The first implementation of Research Objects in 2009 for sharing workflows in myExperiment https://doi.org/10.1093/nar/gkq429 was based on RDF ontologies http://eprints.soton.ac.uk/id/eprint/267787, building on Dublin Core, FOAF, SIOC, Creative Commons, OAI-ORE and (later) DBPedia to form myExperiment ontologies for describing social networking, attribution and creditation, annotations, aggregation packs, experiments, view statistics, contributions, and workflow components. http://web.archive.org/web/20091115080336/http://rdf.myexperiment.org/ontologies Programmatic access to Research Objects was facilitated with an RDF endpoint that exposed individual myExperiment resources, also queriable from a SPARQL endpoint, both using the myExperiment vocabularies and RDF formats RDF/XML and Turtle. Re-use of existing vocabularies. Making lots of new vocabularies. - too many?? Stack becomes very large to learn and use. RDF as its own worst enemy. Need for simplicity. Cater for "web developer". Re-use of existing vocabularies. Making lots of new vocabularies. - too many?? Stack becomes very large to learn and use. RDF as its own worst enemy. Need for simplicity. Cater for "web developer".
  26. Sometimes its too much The irony It is for research infrastructures not research
  27. We started in theory, but we work in practice ROs sit in the middle
  28. “Just enough standards” cf. schema.org What I would like to see in the end is how we are NOT throwing out the Linked Data foundation - and that we want to retain all the benefits, e.g. querying, graph stores, vocabularies, clickable URI as identifiers - but they are mere things we want to keep in our toolbox rather than a goal in itself - and as such the toolbox also have the other things developers expects; documentation, examples, libraries, simplifications rather than generalizations [removing unneeded degrees of freedom like multiple RDF formats]; showing them how to do the Right Thing (tm) almost by accident. Then they can - if they want to and when time is right - pick up some of those shiny tools in the bottom of the toolbox. Just like in cars - they benefit a lot from reusing standards - like communication buses, agreeing on mounts and screws, almost interchangable parts so that manufacturers can mix and match when they put it together. But you as a consumer don't care until you realize your exhaust pipe is broken and the mechanic now can choose how to fix it rather than be restrained to one single manufacturer. some kind of "Why not just plain JSON?" point perhaps. We are not after the exotic features that you will hear about in the rest of the conference, but the bog standard tooling that arguably you could also do in a NoSQL database with enough customization and conventions. But rather than invent our own C&C we retain the Linked Data conventions - "Just enough standards" - following the lead of schema.org
  29. Sponsored by BioExcel (GitHub, Google Docs, monthly telcons)
  30. Cross cutting Infrastructure buy in lifts from boutiques to mainstream
  31. Easy to make, easy to use?
  32. Hence desktop solutions like Describo are also important. Not sure if you mentioned Describo! Library & Information Science Network
  33. Then to can enable the big picture. Get early benefits
  34. Snapshotting evolving papers can we get to the released evolving paper? RO Middleware – added services on top, like a RO commons Big picture new scholarship …. Small picture new middleware…..
  35. [Hardisty A, et al (2020) Conceptual design blueprint for the DiSSCo digitization infrastructure RIO 6: e54280. How to make this practical!
  36. How do we persuade Scientists? We do not That’s why we need to persuade the infrastructure people What other things do we need to do Small Picture – you don’t Developer Friendliness Matters Its underware – customers are infrastructures Marrying the FDO vision with the RO Embed in Research Infrastructures