FAIR BioData Management

FAIR BioData Management
Ulrike Wittig
Heidelberg Institute forTheoretical Studies, Germany

- Where do you store your experimental data?
- What happens with data when a PhD students leaves the group?
- Are all data complete for a publication?
- Do you make regular backups of your local machine?
- Do you send emails to share data with your colleagues?
- Do you always store email attachements in your local directory?
- Do you store all different versions of a data file together in the same place?
- Which protocol was used for the experiment?
...
Why do you need data management?

Vahan Simonyan, Center for Biologics
Evaluation and Research, Food and
DrugAdministration, USA
How well is your experiment
documented?

• Track collection of raw and processed
(secondary) data, models & metadata
• Maintain experimental context
• Organise and link assets
• Choose what to keep and what to ditch
• Report consistently
• Reproducible publications
• Promote standardised metadata practices
• Exchange among colleagues
• How and when to share and publish
• Get and give credit
• Retain and find beyond project
• Integrate with legacy, home grown,
external systems
• Reuse tools and community archives
• Support automation and analytics
workflows. Support curation
CREATING
DATA
PROCESSING
DATA
ANALYSING
DATA
PRESERVING
DATA
ACCESS
TO DATA
RE-USING
DATA
Purpose of Project Data
Management

Purpose of Project Data
Management
Organisation
Communication
Dissemination
Partners
Funders
Public

The FAIR Guiding Principles for scientific data management and stewardship
https://www.nature.com/articles/sdata201618 (2016)

FAIR Checklists
Making Data Findable (documentation and metadata management)
• What documentation and metadata will accompany the data (assist its
discoverability)? (Details on methodology, definitions, procedures, SOPs,
vocabularies, units, dependencies, etc)
• What information is needed for the data to be read and interpreted in the
future?
• What naming conventions will be used?
• How will you approach versioning your data?
• How will you capture / create this documentation and metadata?
• How do you ensure the completeness of the captured data?
Making Data Accessible
Specify which data will be made openly available taking into consideration
• What ethics and legal compliance issues do you have if any? Do you need
consent for data preservation and sharing? Do you have to protect certain
data? Is any data sensitive?
• Do you think you might have Intellectual Property Rights issues? Have you
considered ownership of the data, licensing, restrictions on use?
• Do you think you will need to embargo any data?
• How will you make the data available? (consider the platforms you will use:
databases, repositories, etc)
• What methods or software tools are needed to access the data? shoudl you
include documentation detailing how to access use/access the software that is
needed for accessing the data? Is it possible to include this software with the
data (e.g. source code, docker etc)
• If there are any restrictions on accessibility, how will you provide access?
Making Data Interoperable
• What standards (metadata vocabularies, formats,
checklists) or methodologies will you use?
• How do you address data and model quality?What
validation steps do you foresee?
• Will you use standardised vocabulary for all data types
to allow inter-disciplinary interoperability?
• Where you can not used standardised vocabulary for all
types of data, can you map to more commonly used
ontologies?
Making Data Re-usable
• How will you licence your data to permit the widest re-
use possible?
• When will the data be made available for re-use? Does
this include an embargo period? (if so, why?)
• Which data will be available for re-use during/after the
project? If not, why?
• What are your data quality assurance processes?
• How long do you expect your data to remain re-usable?

FAIRDOM Initiative
- develop a community
- establish an internationally sustained Data and Model Management service
- joint action of ERA-Net EraSysAPP and European Research Infrastructure ISBE

A bit of history : 11Year Anniversary
2008
2010
2014
2018
2012
2016
2020
Standards based asset
management (data,
models, workflows,
SOPs…) for multi-party
projects
Sensitive sharing
Self-deposit / curation
Mixed stewardship skills
Legacy local systems
Community resources
Started in Systems
Biology. Now widened.

SEEK Software
- Open source web platform for sharing scientific research assets, processes
and outcomes
- Associations between data along with information about the people and
organisations (yellow pages)
- ISA (Investigation, Study, Assay) structure for describing how individual
experiments are aggregated into studies and investigations
- Flexible and detailed sharing permissions
- DOI can be generated for individual items, or entire aggregates
- Semantic technology, allowing sophisticated queries over the content
- Collection of meta data
https://seek4science.org/

Models
Data files
SOP Standard
Operating
Procedures
Documents
People
Programmes
Projects
Publications
SEEK Software
PresentationsEvents
Samples

Catalogue of distributed data
Personal Data
Local Stores
External
Databases
Articles
Models
Standards
SOPs

Investigation
Study Analysis
Data
Model
SOP(Assay)
https://fairdomhub.org/investigations/56

Investigation - Study - Assay

Investigation:
Glucose metabolism in P.
falciparum trophozoites
Study:
Model construction
Study:
Model validation
Assay: LDH
Assay: PK
Assay: ENO
Assay: PGM
Assay: PGK
Assay: GAPDH
Assay: TPI
Assay: ALD
Assay: PFK
Assay: PGI
Assay: HK
Assay: GLCtr
Assay: PYRtr
Assay: LACtr
Assay: G3PDH
Assay: GLYtr
Assay: ATPase
Data: GLCtr
Model: GLCtr
Data: HK
Model: HK
Steady state
Incubation
penkler1
Validation data
penkler2
Validation data
...
...
SOP: GLCtr
SOP: HK
...
SOP: Validation
Assay: Culturing
Assay: Lysate prep.
SOP: Culturing
SOP: Lysate prep.
Design an ISA
Investigation - Study - Assay

Data Files, SOPs, Documents
- no file format restrictions
- some formats allow to view the content in SEEK: e.g.Excel,Word, PDF, XML, PNG

Models
SBML Model simulation
Model comparison
Model versioning
Reproducing simulations
[Jacky Snoep, Dagmar Waltemath,
Martin Peters, Martin Scharm]

Tracking model versions smartly
Scharm, M.,Wolkenhauer, O., &Waltemath, D. (2015). An algorithm to detect and communicate the differences in
computational models describing biological systems. Bioinformatics

SpreadsheetTemplates
Embed ontologies into
Excel templates
Excel spreadsheets enriched
with ontology annotations
Upload, extract metadata and register
http://www.rightfield.org.uk

Samples
Generation of templates for sample types
Sample extraction from spreadsheets
HTP sample referencing and
metadata migration

Publishing in SEEK - DOI

Publishing in SEEK
Fix state with particular
versions
Active entry continues to
evolve
Assign a DOI

More than simple supplementary materials
16 datafiles (kinetic, flux inhibition, runout)
19 models (kinetics, validation)
13 SOPs
3 studies (model analysis, construction,
validation)
24 assays/analyses (simulations, model
characterisations)
Penkler, G., du Toit, F., Adams, W., Rautenbach, M.,
Palm, D. C., van Niekerk, D. D. and Snoep, J. L. (2015),
Construction and validation of a detailed kinetic model
of glycolysis in Plasmodium falciparum. FEBS J, 282:
1481–1511. doi:10.1111/febs.13237

Scharm M,Wendland F, Peters M,Wolfien M,TheileT,Waltemath D
SEMS, University of Rostock
zip-like file with a manifest & metadata
- Bundling files - Keeping provenance
- Exchanging data - Shipping results
Bergmann, F.T.,Adams, R., Moodie, S., Cooper, J., Glont, M., Golebiewski, M., ... & Olivier, B. G. (2014). COMBINE archive and OMEX format:
one file to share all information to reproduce a modeling project. BMC bioinformatics,15(1), 1.
Packaging: COMBINEarchive

Standards-based metadata framework for
bundling (scattered) resources with context and citation
Packaging: Research Objects
http://researchobject.org

SEEK as project-specific local
instances or as central FAIRDOMHub
Service hosted at HITS
(Institutional Guarantee at least until 2029)

FAIRDOMHub Statistics
1st July 2019
Programmes 60
Projects 144
Institutions 274
People 1291
Data files 2280
Models 487
SOPs 301
Sample types 63
Presentations 729
Publications 370
Events 178

FAIRDOM Platform
Free and Open Source
Front end
Project(s) Hub
Back end
Onsite storage & analytics
On site
Tracking, data analytic pipelines,
Extract,Transform and Load direct from the
instruments, large data management
LIMS, auto-archiving
Web-based portal
Project controlled spaces
Metadata catalogue &Yellow pages
Results repository, dissemination and collaboration
Tool gateway
Built using Built using

Back end
Instrument Data Management, LIMS, ELN
• samples
• protocols
• instruments
• data management
• experimental description
Norway’s national e-Infrastructure
for Life Science
https://nels.bioinfo.no/
Electronic Laboratory Notebook and Laboratory
Information Management System (ELN-LIMS)
https://csb.ethz.ch/tools/software/openbis-lims-eln.html

[Adapted from Ursula Klingmüller, Martin Böhm]
Excemplify
Antibody
Database
FAIR collaboration
from the ERANet ERASysAPP

38
Programme
Overarching research theme (The Digital Salmon)
Project
Research grant (DigiSal, GenoSysFat)
Investigation
A particular biological process, phenomenon or thing
(typically corresponds to [plans for] one or more closely related
papers)
Study
Experiment whose design reflects a specific biological research
question
Assay
Standardized measurement or diagnostic experiment using a
specific protocol
(applied to material from a study)
Jon Olav Vik,
Norwegian University of Life Science
Integration with Norway’s national
einfrastructure for Life Science (NeLS)

• Project controlled protected spaces
– Working space, show space for results
– Supp. materials space for publications
– Yellow pages and collaboration
– Upload or link to data
• One place catalogue
– Regardless of physical store
– ISA with shared metadata
– Standards-compliant
• Linked with other systems
– Project on-site (secure) repositories
– Public deposition archives
– Integrated with JWSOnline modelling tools
Front End
Find, Access and Organise assets
“Using FAIRDOMHub my own
lab colleagues saw what I was
doing and called to
collaborate!”
https://fairdomhub.org/

Catalogue across repositories regardless
of location
In House Stores
External Databases
Publishing services
Secure Stores
Model Resources
Upload or
Reference

Metadata Exchange along the Pipeline
ELNs

PALs - Project Area Liaisons
PALs
DM Team
Data management training
Requirements & Suggestions
• Training needs for users
• Suggestions to improve SEEK
• Requirements for new SEEK
features and DM services

PALs - Project Area Liaisons
- our user focus group
- post docs, postgrads and techs
- experimentalists, modellers and bioinformaticians
- advocates and communicate our progress back to their projects

Data Stewards
function, profession, cultural shift
• 500,000 needed in Europe*
• Specialist skills
• Career pathways
• Recognition
Curation and management
• Supported, Resourced
• Recognised, Rewarded
Sharing policy and practice embedded
* Realising the Open European Science Cloud (2016)

Independent
researchers
Facilities
Centres
Projects
Programmes
Infrastructures
Different Users, Different Use

LiSyM (Liver Systems Medicine)
German Research Network on
Systems Medicine for Liver Disease
Supported by
The German Federal Ministry of Education and Research 2016-2020
Multiple disciplines
• Medicine
• Biology, Biochemistry
• Pharmacology
• Physics
• Bioinformatics
• Data management
• Industry
38 independent research groups:
• Bayer AG
• Max Planck (Dresden and Berlin)
• MEVIS Fraunhofer (Bremen)
• Leibniz Institute IfaDo (Dortmund)
• Charité (Berlin)
• DKFZ (Heidelberg)
• Hospitals: Dresden, Kiel,Aachen, Homburg,
Berlin, Heidelberg, Munich
• + 18 Universitieshttp://www.lisym.org/

Clinical data sharing concept
Goal:
• Diffuse description of data
throughout consortium
Challenge:
• Some partners cannot share
Solution:
• Share table structure
• Create & share common code
• Distributedly create summaries

NMTrypI
Trypanosomiasis causes sleeping
sickness, leishmaniasis and Chagas
disease - in Africa, South America and
India
EU-funded project 2014 – 2017
Goal: new candidate drugs against
Trypanosomatidic infections
Consortium: 12 partners (3 SMEs and 9
academics) in Europe and in disease-
endemic countries (Italy, Greece, Portugal,
Spain, Germany, UK, Sudan, Brazil)
https://fp7-nmtrypi.eu

NMTrypI specific challenges
• New visualizations of spreadsheet data
• Cross-references with external databases
• Chemical compound specific features
– show structure
– allow (sub)structure search
– create compound summary reports

xxxxx
Visualization of enzyme inhibition by different compounds (in %)
Heat map + Parallel coordinates plot
xxxxxxxxxxx xxxxxxxxxx xxxxxx xxxxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx

Automatic detection of UniprotIDs in Excel-table and
link to UniprotKB and StringDB

Chemical compound specific features

de.NBI -The German Network for
Bioinformatics Infrastructure
de.NBI consortium
• 39 project partners
• 30 institutions
• 8 service centers
https://www.denbi.de/
Mission
• Provide, expand and improve specialized
bioinformatics tools
• Provide access to computing and storage
capacities
• Provide regular training events and workshops
• Maintain and develop specific high-quality data
resources

Research and service topics of de.NBI service centers
HD-HuB
Bioinformatics Infrastructures in Biomedical Research
• Human genetics and genomics
• Metagenomics
• Systematic phenotyping of human cells
• Epigenetics
BiGi
Microbial Research for Biotechnology and Medicine
• High performance computing services
• Repository of reusable workflows
• Comparative genomics and meta-omics
• Post-genomics data integration
BioData
Reference Databases, Services and Tools
• Ribosomal RNAs (SILVA)
• Environmental data (PANGAEA)
• Taxon-associated metadata (BacDive)
• Enzymes & Ligands (BRENDA/EnzymeStructures)
CIBI
Tools for omics data and imaging
• Open-source libraries (OpenMS, SeqAn, FIJI)
• Tools for NGS, mass spec, and imaging
• Workflow engine (KNIME) for automation
• (Multi-)omics data analysis workflows
RBC
RNA Bioinformatics
• Analysis of RNA-related data
• Life science data analysis with Galaxy
• Meta-transcriptomics
• Epigenetic research
de.NBI-SysBio
Standards-based Systems Biology
• Data and model management tools
• SABIO-RK reaction kinetics data
• Methods and tools for modeling in Systems Biology
• Standards & tools for model search and management
GCBN
Crops and BioGreenFormatics
• Plant genetic resources and traits
• Bridging genotypes to phenotypes
• Plant gene and genome annotation
• Enabling technologies to improve crops
BioInfra.Prot
Bioinformatics for Proteomics
• Comprehensive proteomics workflow
• Data publication, analysis & tool services
• Quality standards for targeted proteomics
• Lipidomics
de.NBI -The German Network for
Bioinformatics Infrastructure

Current Actions in de.NBI
• Goal: Make Data FAIRness part of all de.NBI centers
• Idea: Have service centers collect more metadata. No metadata, no
service.
• Approach: Build use cases that involve data management and service
centers
Two example use cases: Medical proteomics center
• Statistical advice service
– tracking of advice given
– making reports FAIR
• From data to PRIDE
– Catalogue links to PRIDE in SEEK/FAIRDOMHub
– Store and standardise intermediate files

Summary FAIRDOM
FAIRDOM Software Platform+Tools
A Central Public Hub
for Projects
Customised Project
Installations
Project Stewardship
Consultancy Services
Community
Activities
144 Projects 30+ Installations

Summary FAIRDOM
Find & Access Central catalogue
Link to original files and external resources
Search
Metadata tagging and standards
Yellow pages of projects and people
Access control to spaces
Embedded tools
Interoperate Rich metadata, standards compliance
Consistent reporting – ISA
Curation support
Integration with other resources, archives, tools
Export packages
Reuse Secure sharing space
Long term retention
Reproducible publication

What can you do? Be FAIR!
1. make a Data Management Plan
2. use standard identifiers
3. use metadata standards
4. catalogue / register data with metadata
5. define and share your SOPs
6. use data (assets) management platforms and tools that work
together
7. deposit into public archives
8. have a sustainability / end project plan
9. resource and support, and that means people too
10. embed data management into work practices and do some
training
11. give credit
12. check if you have sensitive data issues
13. educate your supervisors, institutions and peers

Thanks to our sponsors, partners and
collaborators

Thank you!
https://fair-dom.org/
Questions?
ulrike.wittig@h-its.org

FAIR BioData Management

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to FAIR BioData Management

Similar to FAIR BioData Management (20)

Recently uploaded

Recently uploaded (20)

FAIR BioData Management