Wolfgang KuchinkeWolfgang Kuchinke
University Duesseldorf, Duesseldorf, GermanyUniversity Duesseldorf, Duesseldorf, Germany
CORBEL ProjectCORBEL Project
W. Kuchinke (2018)
Repositories in an Open Data EcosystemRepositories in an Open Data Ecosystem
ECRIN – CORBEL WP 3.3 Working Group MeetingECRIN – CORBEL WP 3.3 Working Group Meeting
14. Jun 2018, Paris, France14. Jun 2018, Paris, France
2
W. Kuchinke (2018)
Open Data – Open
Science
Towards an ecosystem for Open Data and
Sensitive Data
3
W. Kuchinke (2018)
Open data is data that can be freely
used, shared and built-on by
anyone, anywhere, for any purpose.
Data sharing is the precondition for
the reproducibility of research
results.
Open Definition (http://opendefinition.org/okd/)
4
W. Kuchinke (2018)
For reproducibility and progress of research, data sharing is
critical. Providers of human data (e.g. publicly or privately
funded repositories and data archives) should fulfill their
social responsibility with data donors when their shareable
data conforms to the FAIR (findable, accessible,
interoperable, reusable) principles
FAIR data framework
5
W. Kuchinke (2018)
Research data, metadata and data management plans are part of Open
Research Data Management. Research data can contain a wide diversity of
collected information: text or numerical data, biosamples, images,
questionnaires, recorded videos, models, software, reports, workflows, etc.
All information about data type and format of the information needs to be
described. For this purpose, data need to be complemented by proper
metadata.
Metadata are essential to recover and reuse research data. Metadata
standards allow the interoperability across different systems, like repositories.
Metadata can be classified in 3 main types: descriptive, administrative, and
structural. Descriptive metadata serve to discovery and understand a data
source, and refers for example tothe title, author, publication date or abstract,
like, for example the Dublin Core Schema
The Importance of Metadata for
Open Research Data Management
6
W. Kuchinke (2018)
Conceptual representation of
the life cycle of data in biomedical data repositories (secure
storage of biomedical research and
healthcare-related data) from the moment of data generation,
through their utilization and transformation into useful
information, publication and finally their long-term archiving or
destruction
Data life cycle
7W. Kuchinke (2018)
Data Ecosystem
Repositories are the core components of an Open
Data Ecosystem
Many tools and data services support repositories
Different aspects
FAIR principles
Open data and clinical trials data should be stored
together
Cloud storage should be enabled
Analysis tools should be provided
Different data repositories should be connected to
each other
8
W. Kuchinke (2018)
FAIR data framework
Fig. Modified from: Dep. Med Inf. UMG, Groningen
Data Sources Data
Integration
and Data
Curation
Data
Storage
Data Usage
ePatient Record
Clinical Trial Data
Registry Data
Patient Reported Outcome
Sensor Data
Biomaterial Data
eLab Data
Lifestyle Data, Weather,
Medication, Social Media
Transform
Ontology match
Linkage
Data
Warehousing
Data Marts
Data management
and Analysis
Open Data
Data query
Data visualisation
Data analysis
Collaboration
Therapy Board
Data transfer
Data sharing
Publication
Data Governance
Persistent
Identifiers
Privacy Metadata harvesting Data Dictionary
Consent
Management
Identity Management Anonymisation Pseudonymisation
Data Annotation
9
W. Kuchinke (2018)
Move Data Governance to Data Generation step in the
data life cycle
Data Sources Data
Integration
and Data
Curation
Data
Storage
Data Usage
ePatient Record
Clinical Trial Data
Registry Data
Patient Reported Outcome
Sensor Data
Biomaterial Data
eLab Data
Lifestyle Data, Weather,
Medication, Social Media
Transform
Ontology match
Linkage
Data
Warehousing
Data Marts
Data management
and Analysis
Open Data
Data query
Data visualisation
Data analysis
Collaboration
Therapy Board
Data transfer
Data sharing
Publication
Data Governance
Persistent
Identifiers
Privacy Metadata harvesting Data Dictionary
Consent
Management
Identity Management Anonymisation Pseudonymisation
Data Annotation
Data
Governance
for privacy
protection
already at
the step of
data
generation
10
W. Kuchinke (2018)
Persistent Identifiers, Metadata, Privacy protection
become part of data generation
Build-in Data governance and Privacy protection
Data Sources Data
Integration
and Data
Curation
Data
Storage
Data Usage
ePatient Record
Clinical Trial Data
Registry Data
Patient Reported Outcome
Sensor Data
Biomaterial Data
eLab Data
Lifestyle Data, Weather,
Medication, Social Media
Transform
Ontology match
Linkage
Data
Warehousing
Data Marts
Data management
and Analysis
Open Data
Data query
Data visualisation
Data analysis
Collaboration
Therapy Board
Data transfer
Data sharing
Publication
Data Governance
Persistent
Identifiers
Privacy protection
Metadata harvesting
Data Dictionary
Consent
Management
Identity Management
Anonymisation
Pseudonymisation
Data Annotation
11
W. Kuchinke (2018)
Components of the
Repository Data
Ecosystem
An ecosystem suitable even for
Sensitive Data
12
W. Kuchinke (2018)
The Comprehensive Knowledge Archive Network (CKAN) is an open-
source open data portal for the storage and distribution of open
data.
Aimed at data publishers who make their data open and available. The
system is used both as a public platform on Datahub and in various
government data catalogues (e.g. UK's data.gov.uk, Dutch National
Data Register, the United States government's Data.gov and the
Australian government's Gov 2.0).
https://ckan.org/
What is CKAN?
13W. Kuchinke (2018)
CKAN
Open-source data portal platform
Developed by the OKFN (Open Knowledge
Foundation)
It is a complete out-of-the-box software solution
Tools to streamline publishing, sharing, finding and
using data
CKAN includes a web interface and the CKAN
Action API
Visualizations for structured data resources (such
as CSV files)
14W. Kuchinke (2018)
CKAN and reusability of healthcare data
Catalog and metadata saved in CKAN can be harvested based on
the OAI-PMH
Through the CKAN cloud environment, wearable and stationary
sensor data stored in individual CKANs can be integrated
Analysis and integration of clinical data of users based on diagnostic
data saved on the CKAN-based cloud
Prediction of situations, events, and incidents
15
W. Kuchinke (2018)
The Hyve is a company that provides professional IT services for open
source biomedical informatics solutions, to enhance the quality and
impact of research by enabling scientists in life sciences and healthcare
research to properly use open source software, open data and open
standards.
https://thehyve.nl/
What is the Hyve?
16W. Kuchinke (2018)
Tools developed by the Hyve
Portfolio of open source tools and products that facilitate FAIR
research data
FAIR Research Data Management in academic hospitals
Research Data Marts
I2b2 / tranSMART and cBioPortal (for oncology-focus medical
centers)
a robust research data warehouse can be established, which
exposes a unified patient-centric view of clinical and molecular
data for research & analysis
17
W. Kuchinke (2018)
tranSMART is an open-source data warehouse designed to store large
amounts of clinical data from clinical trials, as well as data from basic
research. In tranSMART data can be examined for translational
research purposes. tranSMART is built on top of the i2b2 platform, a
clinical data warehouse employing the i2b2 star model. Each of the data
types (e.g., gene expression, SNP or metabolomics) retain its specific
data structure.
What is tranSMART?
18W. Kuchinke (2018)
tranSMART data warehouse
Designed for use in individual clinical studies with hundreds or
thousands participants in which maybe tens of thousands
observations were gathered
tranSMART is also being adopted by hospitals and large population
studies
Large population study is the Netherlands Twin Register (NTR)
adding indexes, creating partitions, addition of bit strings, Saving
subject sets for single and combined queries, Splitting a query
19W. Kuchinke (2018)
Glowing Bear: the new tranSMART UI
Sponsored by Pfizer, Sanofi, Abbvie and Roche
Cross-study and ontology term support 
Support for time series and longitudinal data 
Possibility of saving queries and re-executing them later
Cohort builder
20
W. Kuchinke (2018)
The Dataverse is an open source web application to share, preserve,
cite, explore and analyze research data. Researchers, authors,
publishers, data distributors, and their institutions receive appropriate
credit via a data citation mechanism including a persistent identifier
(e.g., DOI, or Handle). A Dataverse repository hosts multiple
dataverses; each dataverse contains dataset(s) or other dataverses,
and each dataset contains descriptive metadata together with the data.
https://dataverse.org/
What is Dataverse?
21W. Kuchinke (2018)
Dataverse
Open source web application for sharing, citing, analyzing, and
preserving research data
Developed by the Data Science team at the Institute for Quantitative
Social Science
Dataverse code is open-source and free
Supports DataCite and other citation standards, such as ORCID
Creates a Digital Object Identifier (DOI) upon deposit
22W. Kuchinke (2018)
Dataverse repositories
Harvard Dataverse Network hosts the world's largest collection of
social science research
A Dataverse repository is the software installation, which hosts
multiple dataverses
Each dataverse contains datasets, and each dataset contains
descriptive metadata and data files
Dataverses may contain other dataverses
23W. Kuchinke (2018)
Dataverse datasets
A dataset in Dataverse is a container for data, documentation, code,
and the corresponding metadata which describe the dataset
From: http://guides.dataverse.org/en/latest/user/dataset-management.html
24W. Kuchinke (2018)
Dataverse and Cloud Storage
Dataverse installations can be configured to facilitate cloud-based
storage and computing
Default configuration for Dataverse uses a local file system for
storing data
Cloud-enabled Dataverse installation can use a Swift object storage
database for its data
This allows users to perform computations on data using an
integrated cloud environment
25W. Kuchinke (2018)
Example: DataverseNL
Service for archiving and publishing research data on several levels
faculties, institutions, research groups, projects within Dutch
universities
Possibility to store and share online a large variety of scientific data,
independent of file format, in a secure way
Not suitable for storing (privacy) sensitive data
PSI (Ψ): A Private data Sharing Interface
Privacy Tools Research Group (Harvard)
26W. Kuchinke (2018)
figshare
figshare helps academic institutions store, share and manage all of
their research output
Integrate into your CRIS/RIMS, institutional repository and archiving
solution
All research on figshare can be pushed to any institutional repository
Control how content is shared internally and publically
figshare is hosted on Amazon Web Services but we can also
integrate with centralized cloud
27W. Kuchinke (2018)
●figshare for academic institutions
figshare helps academic institutions store, share and manage their
research output
Integrates into institution’s CRIS/RIMS, the institutional repository
and archiving solutions
All research on figshare can be pushed to any institutional repository
Control how content is shared internally and publicly
figshare is hosted on Amazon Web Services but can also integrate
with a centralized cloud
28W. Kuchinke (2018)
Example: University of Sheffield
Custom portal to manage research data
29W. Kuchinke (2018)
Example: University of Salford / Manchester
Custom portal of figshare
30W. Kuchinke (2018)
OSF (Open Science Framework)
Cloud-based management of projects
View all projects from one dashboard.
Quickly share files
Share key project information and allow others to use and cite it.
See project changes
View project analytics
Archive data
31
W. Kuchinke (2018)
Analysis of the
repository ecosystem
components
32W. Kuchinke (2018)
●Role of Repositories in the Data Ecosystem
A multitude of services and tools to support research data
repositories
Different types of repositories are connected and supplement each
other in the storage, release and sharing of data with different
degree of protection and ownership
Tools to analyze, browse and visualize data should be integrated
Real World Data must be smoothly integrated into the research data
cycle
Data governance and data privacy protection play an important
role
New and efficient tools for anonymisation and data obfuscation
are necessary
33W. Kuchinke (2018)
Overview
Research Data Sharing and Storage Services
A multitude of services and tools support research data repositories to form
an open data ecosystem
Modified from: Instituuts Data Management Plannen, Groningen
During research
During research
After research
After research
BeeHub
B2SAFE
SurfDrive
Local ICT
Services
CLARIN INL
DANS
4TU. Centre for
ResearchData
Zenodo
B2SHARE
figshare
SURF
addgene
Brainmap.org
NeuroVault
OpenMRI
MycoBank
Language
Archive
DataFirst
Dataverse NL
CancerData
DRYAD
Connectome
SeaDataNet
nesstar
TalkBank
OpenML
OpenClinica
Curate
Science
EVIDENCIO
OSF
CRCNS
Dataverse
BeeHub
RUG GeoData
InstitutionDisciplin
34
W. Kuchinke (2018)
Real world evidence (RWE) in medicine means evidence obtained
from real world data (RWD), which is data obtained outside the
context of randomized controlled trials (RCTs); it is generated during
the routine clinical practice. Real world data is stored in Electronic
Health Records (EHR), medical claims or billing activities databases,
registries, patient-generated data, mobile devices, etc. In addition, it
may be derived from retrospective or prospective observational studies
and observational registries.
The necessity for RWD is based on the fact that clinical trials cannot
account for the entire patient population of a particular disease. Patients
suffering from comorbidities or belonging to a special geographic
region, have genetic variations or high age do not in general participate
in any clinical trials.
What is Real World Evidence?
35
W. Kuchinke (2018)
The management of human health and diseases, including
policy and decision making and the development of efficient
healthcare systems demand support by efficient and
rigorous evidence-based investigation and evaluation of
research results. Data are therefore central to further
improvements in public health, primary and hospital care, and
especially for the advancement of personalized medicine.
Relevant data should be collected as part of the usual
healthcare, from routine administrative sources and research
studies. Data governance and data privacy protection
should begin as early as possible, ideally during data
generation.
Rigorous evidence-based investigation
36W. Kuchinke (2018)
Dealing with Real World Clinical Data
Real World Clinical Data play an important role for research
For Patient Reported Outcomes and for Sensor Data
open source RADAR stack (RADAR-CNS project)
RADAR-base Management Portal is a one-stop shop for
managing remote patient monitoring studies
The RADAR Android apps to directly exchange data with the
patients and other care providers
Kafka-based stack: message transport system
European health data networks
Observational Health Data Sciences and Informatics, Observational
Medical Outcomes Partnership (OMOP)
37W. Kuchinke (2018)
Results of Ecosystem Analysis
It doesn‘t matter where one stores data
Everything is connected
Institutional repositories (dataverses), data marts, general
repositories, domain specific repositories, figshare for data sharing
An ecosystem for open data management
Covers complete data life cycle
Complete projects are supported
FAIR data as basis
tranSMART as integration hub for analysis
Integration of data governance and privacy protection at the stage of
data generation
But can sensitive data really be integrated?
Not yet convincingly shown!
38W. Kuchinke (2018)
Contact
Wolfgang Kuchinke
Heinrich-Heine University Düsseldorf, Düsseldorf,
Germany
wolfgang.kuchinke@uni-duesseldorf.de
Presentation contains additional material for explanation and workshop.

Repositories in an Open Data Ecosystem

  • 1.
    Wolfgang KuchinkeWolfgang Kuchinke UniversityDuesseldorf, Duesseldorf, GermanyUniversity Duesseldorf, Duesseldorf, Germany CORBEL ProjectCORBEL Project W. Kuchinke (2018) Repositories in an Open Data EcosystemRepositories in an Open Data Ecosystem ECRIN – CORBEL WP 3.3 Working Group MeetingECRIN – CORBEL WP 3.3 Working Group Meeting 14. Jun 2018, Paris, France14. Jun 2018, Paris, France
  • 2.
    2 W. Kuchinke (2018) OpenData – Open Science Towards an ecosystem for Open Data and Sensitive Data
  • 3.
    3 W. Kuchinke (2018) Opendata is data that can be freely used, shared and built-on by anyone, anywhere, for any purpose. Data sharing is the precondition for the reproducibility of research results. Open Definition (http://opendefinition.org/okd/)
  • 4.
    4 W. Kuchinke (2018) Forreproducibility and progress of research, data sharing is critical. Providers of human data (e.g. publicly or privately funded repositories and data archives) should fulfill their social responsibility with data donors when their shareable data conforms to the FAIR (findable, accessible, interoperable, reusable) principles FAIR data framework
  • 5.
    5 W. Kuchinke (2018) Researchdata, metadata and data management plans are part of Open Research Data Management. Research data can contain a wide diversity of collected information: text or numerical data, biosamples, images, questionnaires, recorded videos, models, software, reports, workflows, etc. All information about data type and format of the information needs to be described. For this purpose, data need to be complemented by proper metadata. Metadata are essential to recover and reuse research data. Metadata standards allow the interoperability across different systems, like repositories. Metadata can be classified in 3 main types: descriptive, administrative, and structural. Descriptive metadata serve to discovery and understand a data source, and refers for example tothe title, author, publication date or abstract, like, for example the Dublin Core Schema The Importance of Metadata for Open Research Data Management
  • 6.
    6 W. Kuchinke (2018) Conceptualrepresentation of the life cycle of data in biomedical data repositories (secure storage of biomedical research and healthcare-related data) from the moment of data generation, through their utilization and transformation into useful information, publication and finally their long-term archiving or destruction Data life cycle
  • 7.
    7W. Kuchinke (2018) DataEcosystem Repositories are the core components of an Open Data Ecosystem Many tools and data services support repositories Different aspects FAIR principles Open data and clinical trials data should be stored together Cloud storage should be enabled Analysis tools should be provided Different data repositories should be connected to each other
  • 8.
    8 W. Kuchinke (2018) FAIRdata framework Fig. Modified from: Dep. Med Inf. UMG, Groningen Data Sources Data Integration and Data Curation Data Storage Data Usage ePatient Record Clinical Trial Data Registry Data Patient Reported Outcome Sensor Data Biomaterial Data eLab Data Lifestyle Data, Weather, Medication, Social Media Transform Ontology match Linkage Data Warehousing Data Marts Data management and Analysis Open Data Data query Data visualisation Data analysis Collaboration Therapy Board Data transfer Data sharing Publication Data Governance Persistent Identifiers Privacy Metadata harvesting Data Dictionary Consent Management Identity Management Anonymisation Pseudonymisation Data Annotation
  • 9.
    9 W. Kuchinke (2018) MoveData Governance to Data Generation step in the data life cycle Data Sources Data Integration and Data Curation Data Storage Data Usage ePatient Record Clinical Trial Data Registry Data Patient Reported Outcome Sensor Data Biomaterial Data eLab Data Lifestyle Data, Weather, Medication, Social Media Transform Ontology match Linkage Data Warehousing Data Marts Data management and Analysis Open Data Data query Data visualisation Data analysis Collaboration Therapy Board Data transfer Data sharing Publication Data Governance Persistent Identifiers Privacy Metadata harvesting Data Dictionary Consent Management Identity Management Anonymisation Pseudonymisation Data Annotation Data Governance for privacy protection already at the step of data generation
  • 10.
    10 W. Kuchinke (2018) PersistentIdentifiers, Metadata, Privacy protection become part of data generation Build-in Data governance and Privacy protection Data Sources Data Integration and Data Curation Data Storage Data Usage ePatient Record Clinical Trial Data Registry Data Patient Reported Outcome Sensor Data Biomaterial Data eLab Data Lifestyle Data, Weather, Medication, Social Media Transform Ontology match Linkage Data Warehousing Data Marts Data management and Analysis Open Data Data query Data visualisation Data analysis Collaboration Therapy Board Data transfer Data sharing Publication Data Governance Persistent Identifiers Privacy protection Metadata harvesting Data Dictionary Consent Management Identity Management Anonymisation Pseudonymisation Data Annotation
  • 11.
    11 W. Kuchinke (2018) Componentsof the Repository Data Ecosystem An ecosystem suitable even for Sensitive Data
  • 12.
    12 W. Kuchinke (2018) TheComprehensive Knowledge Archive Network (CKAN) is an open- source open data portal for the storage and distribution of open data. Aimed at data publishers who make their data open and available. The system is used both as a public platform on Datahub and in various government data catalogues (e.g. UK's data.gov.uk, Dutch National Data Register, the United States government's Data.gov and the Australian government's Gov 2.0). https://ckan.org/ What is CKAN?
  • 13.
    13W. Kuchinke (2018) CKAN Open-sourcedata portal platform Developed by the OKFN (Open Knowledge Foundation) It is a complete out-of-the-box software solution Tools to streamline publishing, sharing, finding and using data CKAN includes a web interface and the CKAN Action API Visualizations for structured data resources (such as CSV files)
  • 14.
    14W. Kuchinke (2018) CKANand reusability of healthcare data Catalog and metadata saved in CKAN can be harvested based on the OAI-PMH Through the CKAN cloud environment, wearable and stationary sensor data stored in individual CKANs can be integrated Analysis and integration of clinical data of users based on diagnostic data saved on the CKAN-based cloud Prediction of situations, events, and incidents
  • 15.
    15 W. Kuchinke (2018) TheHyve is a company that provides professional IT services for open source biomedical informatics solutions, to enhance the quality and impact of research by enabling scientists in life sciences and healthcare research to properly use open source software, open data and open standards. https://thehyve.nl/ What is the Hyve?
  • 16.
    16W. Kuchinke (2018) Toolsdeveloped by the Hyve Portfolio of open source tools and products that facilitate FAIR research data FAIR Research Data Management in academic hospitals Research Data Marts I2b2 / tranSMART and cBioPortal (for oncology-focus medical centers) a robust research data warehouse can be established, which exposes a unified patient-centric view of clinical and molecular data for research & analysis
  • 17.
    17 W. Kuchinke (2018) tranSMARTis an open-source data warehouse designed to store large amounts of clinical data from clinical trials, as well as data from basic research. In tranSMART data can be examined for translational research purposes. tranSMART is built on top of the i2b2 platform, a clinical data warehouse employing the i2b2 star model. Each of the data types (e.g., gene expression, SNP or metabolomics) retain its specific data structure. What is tranSMART?
  • 18.
    18W. Kuchinke (2018) tranSMARTdata warehouse Designed for use in individual clinical studies with hundreds or thousands participants in which maybe tens of thousands observations were gathered tranSMART is also being adopted by hospitals and large population studies Large population study is the Netherlands Twin Register (NTR) adding indexes, creating partitions, addition of bit strings, Saving subject sets for single and combined queries, Splitting a query
  • 19.
    19W. Kuchinke (2018) GlowingBear: the new tranSMART UI Sponsored by Pfizer, Sanofi, Abbvie and Roche Cross-study and ontology term support  Support for time series and longitudinal data  Possibility of saving queries and re-executing them later Cohort builder
  • 20.
    20 W. Kuchinke (2018) TheDataverse is an open source web application to share, preserve, cite, explore and analyze research data. Researchers, authors, publishers, data distributors, and their institutions receive appropriate credit via a data citation mechanism including a persistent identifier (e.g., DOI, or Handle). A Dataverse repository hosts multiple dataverses; each dataverse contains dataset(s) or other dataverses, and each dataset contains descriptive metadata together with the data. https://dataverse.org/ What is Dataverse?
  • 21.
    21W. Kuchinke (2018) Dataverse Opensource web application for sharing, citing, analyzing, and preserving research data Developed by the Data Science team at the Institute for Quantitative Social Science Dataverse code is open-source and free Supports DataCite and other citation standards, such as ORCID Creates a Digital Object Identifier (DOI) upon deposit
  • 22.
    22W. Kuchinke (2018) Dataverserepositories Harvard Dataverse Network hosts the world's largest collection of social science research A Dataverse repository is the software installation, which hosts multiple dataverses Each dataverse contains datasets, and each dataset contains descriptive metadata and data files Dataverses may contain other dataverses
  • 23.
    23W. Kuchinke (2018) Dataversedatasets A dataset in Dataverse is a container for data, documentation, code, and the corresponding metadata which describe the dataset From: http://guides.dataverse.org/en/latest/user/dataset-management.html
  • 24.
    24W. Kuchinke (2018) Dataverseand Cloud Storage Dataverse installations can be configured to facilitate cloud-based storage and computing Default configuration for Dataverse uses a local file system for storing data Cloud-enabled Dataverse installation can use a Swift object storage database for its data This allows users to perform computations on data using an integrated cloud environment
  • 25.
    25W. Kuchinke (2018) Example:DataverseNL Service for archiving and publishing research data on several levels faculties, institutions, research groups, projects within Dutch universities Possibility to store and share online a large variety of scientific data, independent of file format, in a secure way Not suitable for storing (privacy) sensitive data PSI (Ψ): A Private data Sharing Interface Privacy Tools Research Group (Harvard)
  • 26.
    26W. Kuchinke (2018) figshare figsharehelps academic institutions store, share and manage all of their research output Integrate into your CRIS/RIMS, institutional repository and archiving solution All research on figshare can be pushed to any institutional repository Control how content is shared internally and publically figshare is hosted on Amazon Web Services but we can also integrate with centralized cloud
  • 27.
    27W. Kuchinke (2018) ●figsharefor academic institutions figshare helps academic institutions store, share and manage their research output Integrates into institution’s CRIS/RIMS, the institutional repository and archiving solutions All research on figshare can be pushed to any institutional repository Control how content is shared internally and publicly figshare is hosted on Amazon Web Services but can also integrate with a centralized cloud
  • 28.
    28W. Kuchinke (2018) Example:University of Sheffield Custom portal to manage research data
  • 29.
    29W. Kuchinke (2018) Example:University of Salford / Manchester Custom portal of figshare
  • 30.
    30W. Kuchinke (2018) OSF(Open Science Framework) Cloud-based management of projects View all projects from one dashboard. Quickly share files Share key project information and allow others to use and cite it. See project changes View project analytics Archive data
  • 31.
    31 W. Kuchinke (2018) Analysisof the repository ecosystem components
  • 32.
    32W. Kuchinke (2018) ●Roleof Repositories in the Data Ecosystem A multitude of services and tools to support research data repositories Different types of repositories are connected and supplement each other in the storage, release and sharing of data with different degree of protection and ownership Tools to analyze, browse and visualize data should be integrated Real World Data must be smoothly integrated into the research data cycle Data governance and data privacy protection play an important role New and efficient tools for anonymisation and data obfuscation are necessary
  • 33.
    33W. Kuchinke (2018) Overview ResearchData Sharing and Storage Services A multitude of services and tools support research data repositories to form an open data ecosystem Modified from: Instituuts Data Management Plannen, Groningen During research During research After research After research BeeHub B2SAFE SurfDrive Local ICT Services CLARIN INL DANS 4TU. Centre for ResearchData Zenodo B2SHARE figshare SURF addgene Brainmap.org NeuroVault OpenMRI MycoBank Language Archive DataFirst Dataverse NL CancerData DRYAD Connectome SeaDataNet nesstar TalkBank OpenML OpenClinica Curate Science EVIDENCIO OSF CRCNS Dataverse BeeHub RUG GeoData InstitutionDisciplin
  • 34.
    34 W. Kuchinke (2018) Realworld evidence (RWE) in medicine means evidence obtained from real world data (RWD), which is data obtained outside the context of randomized controlled trials (RCTs); it is generated during the routine clinical practice. Real world data is stored in Electronic Health Records (EHR), medical claims or billing activities databases, registries, patient-generated data, mobile devices, etc. In addition, it may be derived from retrospective or prospective observational studies and observational registries. The necessity for RWD is based on the fact that clinical trials cannot account for the entire patient population of a particular disease. Patients suffering from comorbidities or belonging to a special geographic region, have genetic variations or high age do not in general participate in any clinical trials. What is Real World Evidence?
  • 35.
    35 W. Kuchinke (2018) Themanagement of human health and diseases, including policy and decision making and the development of efficient healthcare systems demand support by efficient and rigorous evidence-based investigation and evaluation of research results. Data are therefore central to further improvements in public health, primary and hospital care, and especially for the advancement of personalized medicine. Relevant data should be collected as part of the usual healthcare, from routine administrative sources and research studies. Data governance and data privacy protection should begin as early as possible, ideally during data generation. Rigorous evidence-based investigation
  • 36.
    36W. Kuchinke (2018) Dealingwith Real World Clinical Data Real World Clinical Data play an important role for research For Patient Reported Outcomes and for Sensor Data open source RADAR stack (RADAR-CNS project) RADAR-base Management Portal is a one-stop shop for managing remote patient monitoring studies The RADAR Android apps to directly exchange data with the patients and other care providers Kafka-based stack: message transport system European health data networks Observational Health Data Sciences and Informatics, Observational Medical Outcomes Partnership (OMOP)
  • 37.
    37W. Kuchinke (2018) Resultsof Ecosystem Analysis It doesn‘t matter where one stores data Everything is connected Institutional repositories (dataverses), data marts, general repositories, domain specific repositories, figshare for data sharing An ecosystem for open data management Covers complete data life cycle Complete projects are supported FAIR data as basis tranSMART as integration hub for analysis Integration of data governance and privacy protection at the stage of data generation But can sensitive data really be integrated? Not yet convincingly shown!
  • 38.
    38W. Kuchinke (2018) Contact WolfgangKuchinke Heinrich-Heine University Düsseldorf, Düsseldorf, Germany wolfgang.kuchinke@uni-duesseldorf.de Presentation contains additional material for explanation and workshop.