The webinar discussed FAIRDOM services that can help applicants to the ERACoBioTech call with their data management plans and requirements. FAIRDOM offers webinars on developing data management plans, and their platform and tools can help with organizing, storing, sharing, and publishing research data and models in a FAIR manner by utilizing metadata standards. Different levels of support are available, from general community resources through their hub, to premium customized support for individual projects. Consortia can include FAIRDOM as a subcontractor within the guidelines of the ERACoBioTech call.
Disentangling the origin of chemical differences using GHOST
ERA CoBioTech Data Management Webinar
1. ERACoBioTech
data management webinar
The FAIRDOM Consortium
http://fair-dom.org, http://fairdomhub.org
these slides:
FAQ: http://tinyurl.com/fairdom-eracobiotechfaq
2. FAIRDOM Services for the co-funded call
• Applicants are encouraged (but not obliged) to utilise the DM
services offered through the central data management project
FAIRDOM , and to participate in the voluntary Open Research
Data Pilot in Horizon 2020.
• Webinars to support the consortia in developing and preparing
a data management plan will be provided both proposal
preparation phases (announced on https://www.submission-
cobiotech.eu/).
– First phase: Feb 2 and Feb 7
• More information on data management requirements within
this call is detailed in ANNEX 5: Data management.
• Pre-proposals
– Indication of data management plans and data
management infrastructure and where appropriate data
management provider
• Full proposals
– Cost of data management clearly budgeted
– Data management template to guide you
– Detailed data management plan
https://www.cobiotech.eu/
8. Data Management Planning Checklist
• General
• What data will be collected or created as part of the study (RAW data)?
• What data will be produced by processing the RAW data (Secondary, processed data)?
• Are existing data is being re-used (if any)?
• What is the origin of the data?
• What are the types and formats you plan to use for the data generated/collected (raw, processed, published)?
• What data will be published as the result of your study?
• What are the cost estimates of making your data FAIR?
• Do you have any national/funder/sectorial/departmental procedures for data management?
Based on H2020 FAIR Guidelines
9. Based on H2020 FAIR Guidelines
Volume and Life Cycle of the Data
• Raw data
• How much RAW data you think will be produced (Estimates, per month, year, full project duration)?
• Will all of the RAW data be kept for the duration of the study or will the RAW data be deleted once it is
processed?
• For large scale RAW data (images, sequence) have you planned the local storage capacity necessary for
processing?
• Do you require help to organise a suitable local management system for RAW data?
• Do you have policies that govern the management and usage of RAW data?
• How long will RAW data be kept?
• Will there be a long-term archive?
• Secondary and Published data
• What data processing is foreseen in the project?
• How much processed data will be produced, and stored (can you make estimates per month, year, full
project)?
• How much of this data will be published? (Estimates per month, year, full project)?
• Does your institution, or the project funders, have policies governing the access and usage of processed
data?
10. Based on H2020 FAIR Guidelines
Personally sensitive data (e.g. medical data)
Data flow through the project, define what data is:
• aggregated (typically safe to share, if names cannot be recovered)
• anonymized (name cannot be recovered from the data)
• pseudonymized (name can be recovered by some)
• non-anonymized (name linked to data)
Which organisational boundaries have to be traversed by which data?
• Make sure with your local data protection officer and ethics commission that the data can be shared with your partners
along the flow described with the anonymisation levels as described.
Why local?
• Some laws change across surprising boundaries.
• E.g. in Germany Universities and other public organisations are subject to another data protection law than enterprises.
Why seek advice?
• Maybe required to be able to recover the name-data-relation, e.g. to enable study participants to *leave* a study.
• What provisions will you have in place for data recovery, secure storage, and transfer of sensitive data?
11. Making Data Findable (documentation and metadata management)
• What documentation and metadata will accompany the data (assist its
discoverability)? (Details on methodology, definitions, procedures,
SOPs, vocabularies, units, dependencies, etc)
• What information is needed for the data to be read and interpreted in
the future?
• What naming conventions will be used?
• How will you approach versioning your data?
• How will you capture / create this documentation and metadata?
• How do you ensure the completeness of the captured data?
Making DataAccessible
Specify which data will be made openly available taking into consideration
• What ethics and legal compliance issues do you have if any? Do you
need consent for data preservation and sharing? Do you have to protect
certain data? Is any data sensitive?
• Do you think you might have Intellectual Property Rights issues? Have
you considered ownership of the data, licensing, restrictions on use?
• Do you think you will need to embargo any data?
• How will you make the data available? (consider the platforms you will
use: databases, repositories, etc)
• What methods or software tools are needed to access the data? shoudl
you include documentation detailing how to access use/access the
software that is needed for accessing the data? Is it possible to include
this software with the data (e.g. source code, docker etc)
• If there are any restrictions on accessibility, how will you provide
access?
Making Data Interoperable
• What standards (metadata vocabularies, formats,
checklists) or methodologies will you use?
• How do you address data and model quality?What
validation steps do you foresee?
• Will you use standardised vocabulary for all data types
to allow inter-disciplinary interoperability?
• Where you can not used standardised vocabulary for all
types of data, can you map to more commonly used
ontologies?
Making Data Re-usable
• How will you licence your data to permit the widest re-
use possible?
• When will the data be made available for re-use? Does
this include an embargo period? (if so, why?)
• Which data will be available for re-use during/after the
project? If not, why?
• What are your data quality assurance processes?
• How long do you expect your data to remain re-usable?
12.
13. FAIRDOM Platforms and Tools
FAIR Hub
Web-based
Metadata catalogue
Project Hub
Results repository and
showcase
Tool gateway
Collaboration portal
Storage & analytics
On site
Tracking, analytic pipelines,
LIMS, auto-archiving
Extract,Transform and Load
direct from the instruments,
large data
Metadata annotation
Model simulation
Standards compliance
Tool Pool
Reproducible publishing
In House
Storage
System
Public
archives
14. • Trusted long-term repository
• Repository space during and after
project
• Project controlled spaces
• Working space for projects
• Show space for communicating results
• Collaboration space for partners
• Supp. materials space for publications
• Portal to project on-site repositories
• Portal to modelling tools + public
archives
Nucl.Acids Res. (2016) doi: 10.1093/nar/gkw1032
• FAIRDOMHub
• Common Space
• for projects and programmes
Find – Access – Interoperate – Reuse
Collaborate – Control – Organise – Retain
15.
16. FAIR Collaboration and
FAIR long-lived store
758 people
58 projects + 10 more coming with isbe.NL
193 institutions
17.
18. Find & Access in one place
• What about my big data? What about other data archives?
• Catalogues and Aggregates data regardless where it is stored.
– Organise, find and share all
experimental outputs in one
place
– Organise across on-site,
internal, secure and public
stores all from one place
– Setup on-site or in the cloud
– Use national or institutional
data storage infrastructure
– Use our managed central Hub
to upload, to organise, to
catalogue and to safely save
for the long-term
Your Onsite Store
19. Find & Access in one place
Central catalogue
– Organise all experimental data in one place
– Structure recording of experiments and files
– Link to original data, model and process files
– Mix central, public and on site stores
– Cross data store silos
Metadata tagging and standards
– Tag data, models
– Sys and SynBio standards for models and data
Intelligent search
– Search across experiments and attached files
– Rich metadata search
– Search across model repositories
Yellow pages of projects and people
– Gather people together, Find people and skills
– Collaborate
21. Find & Access in one place
FAIRDOMHub
– Store files in your space on our managed, central Hub
– Upload and download all data and model formats
– Full CSV support, Version files
– 1 TB private storage, guaranteed to 2029
Flexible Access Control to Spaces
– Share with any number of research collaborators
– Manage fine-grain access permissions
– Secure data transfer and access
Federate with on site/national data stores
– Keep data on site using your own store
– Install our backend storage platform
– Portal over external stores and data infrastructure
Federate with public community archives
– Access and link to content in different archives
– Deposit your results into archives
Run modelling tools
– Simulate with experimental data
– Compare and version; differentiate construction, validation & predicted data
22. Interoperate
Standards compliance
– Systems and Synthetic Biology standards support
– Support in finding standards & project wide standardisation
Consistent reporting
– Structured using ISA
– Our specially made Just Enough Results Model
Metadata curation
– Spreadsheet templates For omics data and samples
– Data and model annotation tools
Integrate with existing systems
– Integrated tools for modelling, parts, ELNs, LIMS
– Custom plugins for your tools
– REST API to plugin to your systems
Export
– Package and export into other repositories
– Export into other FAIRDOM installations
– COMBINE Archive export
24. Key Features at a Glance
Secure Sharing Space
– In your project, future projects, with others
– Metadata and/or data
Long term retention
– Keep results beyond a project lifetime
– Track collection of data and metadata
Smart publication
– Showcase results through FAIRDOMHub
– DOI snapshots
– Download; Export to publisher stores
Reproduce publications
– All experimental outputs organised together
– Consistent reporting
– Reproducible models and SOPs
– Simulate models with experimental data
Track analytics
– Track downloads and get credit
Reuse
Collaborate
25. Example
• Publication in 2014/2015
• SysMO call 1 Project
• MOSES project
• Using data from 2012
• Project ended in 2010
• Because FAIRDOM looked after the data and the model.
677 views
DOI
26. Roll your Own Hub
Adopted by over 30 other initiatives
data
store
local store
secure store
our people
Less data, more metadata, potentially wider access
processed data
published dataHTP data
Remember to cost your storage,
backup, archiving and licenses
27. data
Adopted by over 30 other initiatives
data
store
local store
secure store
our tools
our people
Less data, more metadata, potentially wider access
processed data published dataHTP data
Roll your Own Hub
28. IMOMESIC pathway: Integrating Modelling of Metabolism and Signalling
towards anApplication in Liver Cancer
https://fairdomhub.org/projects/24
[Ursula Klingmüller, Martin Böhm]
34. Projects: Premium and Super-Premium
project PALs,
modellers meet experimentalists
user forums, training, standards watch,
online resources, best practices,
custom data and model procedures,
linking data to analytics, custom
metadata setups
data curation support
technical model curation support
reproducible publishing support
help with plug-ins
new features development,
compliance mandates and standards
36. Standard, general,
community-level activities.
Use FAIRDOMHub.
Getting the most out of Hub
services. DIY local installation.
Premium. Direct support of
projects. Installation support.
Full customer service.
Super-Premium, extensive
tailoring, integrations and
adaptations of platforms.
Custom and dedicated services.
In house installation support.
Funders
Call and proposal support. Geared events.
Knowledge Hub and web site
Negotiating FAIRDOMHub block subscription.
per project negotiation
Remember to
cost your local
storage and
servers too
Plus any
licences you
need
~5-10% of total proposal budget
H2020 PM rates
20-40 days consultancy/annum
Stewardship Service Levels
37. FAIRDOM Model
By FAIRDOMAssociation
• Legal entity
• German
• Subcontract status, FEC
• Delivery will be through a
combination of preferred or
designated FAIRDOM facilities
• Contribution to the core built in
By FAIRDOM Facility
• Institutional entity
• National identity
• Partner/Co-investigator status
• Delivery through that FAIRDOM
Facility
• Contribution to the core by
arrangement
Manchester
Edinburgh
HITS
Leiden
ETHZ/UZH
ELIXIR Norway
NMBU
ISBE.si
National
Institute of
Biology
Association e.V.
38. Consortium Arrangements
CoBioTech Rules
• 21 national funders, each with
their own regulations
• Consortia
• 3-6 partners
• 3-8 partners if include AR, ES, IL,
LV, PT, RO, RU, SI, TR
• 3 different countries
• up to 2 partners from same
country
• Funder principles
• Subcontractors can be included and
are managed under the national or
regional financing regulations of the
eligible participant
Manchester
Edinburgh
HITS
Leiden
ETHZ/UZH
ELIXIR Norway
NMBU
ISBE.si
National
Institute of
Biology
Association e.V.
Consistent reporting
Simulate models with exp’mtl data
Publisher/Funder Commons with DOIs
Download, package and export
Data Management Planning
Tailored Data Management design
Tailored metadata structures and pipelines
Tailored platform install
Tailored showcase and exchange
Requirements priority
Help in DM problem solving
Help in linking data to analytics
Help in compliance
Help during project movements and staff changes
Help at project sunset time
Help for reprod. Publication
Build a PALs network
Tailored Training, Workshops, Site Visits
Curation support
Data Management Planning
Tailored Data Management design
Tailored metadata structures and pipelines
Tailored platform install
Tailored showcase and exchange
Requirements priority
Help in DM problem solving
Help in linking data to analytics
Help in compliance
Help during project movements and staff changes
Help at project sunset time
Help for reprod. Publication
Build a PALs network
Tailored Training, Workshops, Site Visits
Curation support