The document describes a pilot implementation of a security framework to enable legally compliant sharing of sensitive biomedical data between research infrastructures. The goals are to comprehensively analyze ethical, legal and security requirements, develop a security framework that ensures compliance with regulations and protects privacy/security, and enable sharing of medical data for clinical research. The pilot will demonstrate secure access to data from biobanks and clinical trials by implementing identity management, authorization workflows, and data access policies. The developed tools aim to have impact by aiding legal and ethical data sharing, though maintaining them long-term requires commitment from research organizations.
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
Ā
Secure access to biomedical data sources for legal data sharing-kuchinke
1. Enabling Secure Access to
Biomedical Data Sources
Pilot installation for legally compliant data
sharing in BioMedBridges
Seminar AGM Meeting in Munich, Germany
18 February 2016
Wolfgang Kuchinke
2. Aim
ā Comprehensive analysis of ethical, legal and
regulatory requirements and of data security
risks for data sharing between Research
Infrastructures
ā Development of a security framework to ensure
that data bridges of BioMedBridges are
compliant with regulations and protect privacy
and data security
ā Enable the sharing of sensitive, medical data for
clinical research at Research Infrastructures
W. Kuchinke (2016)
3. Background
ā Translational research is a promising approach to
improve the discovery of new therapies
ā Collaboration of biomedical researchers and clinical
practitioners is required
ā Translational research is data intensive and needs
support of information technology to enable
efficient data sharing, data linking and analysis
ā Access to data of clinical trials from diverse sources
can improve the discovery of benefits, but also of
unwanted effects of drugs and therapies
3
W. Kuchinke (2016)
4. BioMedBridges
ā BioMedBridges project is developing a pilot implementation
that integrates research data and clinical data resources
ā Aim is to enable data bridges for data sharing between
research infrastructures
ā Access to data is restricted by data access policies, rules and
regulations
ā Requirement of data access approvals and data
management plans
ā The pilot implementation can demonstrate that a secure
access to sensitive data for research purposes is possible
ā Access approvals can be automatised
ā Sophisticated identity and permissions management can be
realised and simplified by using well known Open Source
software components
4
W. Kuchinke (2016)
8. Project Plan
ā Development of a security framework
ā Work on WT1-7 has been finished
ā Identification of data security and privacy
requirements for data bridges
ā Results were compiled in ļ D5.1 and D5.2
ā Support for the assessment of ethical and legal
requirements
ā
Legal Assessment Tool (LAT)
ā Design of the security architecture and
authorisation framework
8
10. How are we meeting our
objectives?
ā Generation of security requirements for an e-
infrastructure based on use cases
ā Threat and risk analysis for sharing data
ā Development of threat mitigation strategy
ā Implementation of a pilot to verify the feasibility of
the security framework
ā Collaborative implementation of secure access to
sensitive, human data (example of biobank data)
involving ELIXIR and BBMRI
10
11. ļ”Risk analysis was conducted to identify
countermeasures against threats
ļ” Authentication, authorization, secure communication,
encryption of data, anonymization, pseudonymization,
auditing and data provenance
ļ” Deployment solutions for the mitigation of identified
risks
ļ”Specification of three access tiers
ļ” Open, restricted, and committee-controlled
ļ”Activity diagrams indicate required actions for data
sharing by secure data bridges
Security architecture - Overview
11
13. ļ”Sensitive and confidential data can be safeguarded by
regulating or restricting the access and the usage of data
ļ”Access controls should always be proportionate to the
kind of data
ļ”Example:
ā UK Data Service has three levels of access for data
ā safeguarded data (data that contain no personal information, but the data owner
considers a risk of disclosure is possible by linkage to other data)
ā controlled data - data that may be disclosive
ā open data - data without personal information
ļ”Controlled data are only available to users who have been
trained and accredited and their data usage has been
approved by a relevant Data Access Committee (DAC)
Regulating access to data
13
W. Kuchinke (2016)
14. ļ”Example: The European Genome-phenome Archive (EGA)
ā Services for archiving, processing and distribution for all types
of potentially identifiable genetic and phenotypic human data at
the European Bioinformatics Institute (EBI)
ļ”Controlled access data is defined by the original informed
consent agreements signed by the participants involved in the
study
ā Controlled access data often consists of human data derived
from medical research
ā All data submitted to the EGA must be subject to controlled
access
ā Access to data is controlled by a Data Access Committee
(DAC), which must be registered as part of the submission
process
Regulating access to data
14
W. Kuchinke (2016)
15. ļ”BioSamples Database from ELIXIR
ļ”BBMRI catalogue (BBMRI Hub) extended with a
MIABIS layer and a data cube layer
ļ”Resource Entitlement Management System
ļ”Legal Assessment Tool
Infrastructure elements
15
W. Kuchinke (2016)
16. ļ”A body of one or more named individuals who are
responsible for data release to external requestors
based on consent and/or National Research
Ethics requirements
ļ”To establish a DAC at the EGA one must register
the details of your DAC during the submission
process via either Webin or DAC.xml
ļ”Each dataset that is submitted to EGA databases
must be linked to a Data Access Agreement
(DAA), which defines the terms and conditions of
using the dataset
Data Access Committee (DAC)
16
W. Kuchinke (2016)
17. ļ”The Resource Entitlement Management System
(REMS) is open source software
ļ”It manages policies for granting access to data
resources
ļ”For example:
ā An application for access is required to get access to
clinical data from a web application like BBMRI Hub
ā Information about the purpose of research, or needed types
of data access and data protection requirements have to be
provided by the applicant
ā Approval is granted by a Data Access Committee (DAC)
Resource Entitlement Management
System
17
W. Kuchinke (2016)
18. ļ”REMS allows data managers to define for each data
source an authorisation workflow
ļ”This workflow can be employed by software systems
to ensure that users are entitled to access data
ļ”REMS manages agreements between multiple data
owners, data sets and data users
ļ”It delegates responsibilities to the DAC using data
access agreements
ļ”It is being successfully used by the European
Genome-phenome Archive
Resource Entitlement Management
System
18
19. ļ”Shibboleth is a single sign-on log-in system for computer
networks and the Internet
ļ”It is an open source implementation for identity management and
federated identity-based authentication and a authorization (or
access control) infrastructure
ļ”It is based on Security Assertion Markup Language (SAML)
ļ”Federated identities allow the sharing of information about users
from one security domain to another one
ļ”This allows for cross-domain single sign-on and removes the
need for content providers to maintain user names and
passwords
ļ”Identity providers (IdPs) supply user information, while service
providers (SPs) use this information to give access
Shibboleth Single Sign-on
19
20. ļ”Web-based technology that implements the HTTP/POST artifact and
attribute push profiles of SAML, including both Identity Provider (IdP)
and Service Provider (SP) components
ļ”SAML authentication assertion with a temporary "handle" contained
within it
ā
This handle allows the IdP to recognize a request about a particular
user as corresponding to the principal that authenticated earlier
ļ”The SP can request a specific type of authentication from the IdP
ļ”Shibboleth 2.0 supports additional encryption capacity
ļ”Attributes
ā
Access control is performed by matching attributes supplied by IdPs
against rules defined by SPs.
ļ”The SP makes an access decision based on the attributes
Shibboleth technology
20
22. SAML
ļ”The Security Assertion Markup Language is a standard for exchanging
authentication and authorization identities
ļ”SAML is an XML-based protocol that uses security tokens containing
assertions to pass information about a principal (end user) between a
SAML authority, named an Identity Provider (IP), and a SAML
consumer, named a Service Provider
ā Enables web-based, cross-domain single sign-on (SSO)
ā An assertion is a package of information that supplies statements
made by a SAML authority
ā They are usually made about a subject, represented by the
<Subject> element
ļ”Authorization Decision Assertion is a request to allow the assertion
subject to access the specified resource has been granted or denied
ļ”An important type of SAML assertion is the so-called "bearer" assertion
used to facilitate Web Browser SSO
23. MIABIS
ļ”The Minimum Information About Biobank data Sharing (MIABIS)
standardizes the data elements used to describe biobanks,
research on samples and associated data
ļ”The MIABIS Community Standards have several granularity
levels
ā
Support of interoperability between biobanks sharing their
data
ā
General attributes to describe biobanks, sample collections
and studies at an aggregated/metadata level (MIABIS Core
2.0)
ā
New MIABIS modules describe samples and sample donors
at individual level
23
24. BioSamples Database
ļ”BioSamples Database stores and supplies descriptions
and metadata about biological samples used in research
ļ”Sample data are from many sources (e.g.
1000Genomes, HipSci, FAANG) or have been used by
European Nucleotide Archive (ENA) or ArrayExpress
ļ”The BioSamples Database aggregates information for
reference samples, as well as samples for which data
exist in one of the EBIās other data resources
ļ”All samples are described with annotations
ā
These annotations can be linked to ontology or controlled vocabulary
24
25. Legal Assessment Tool (LAT)
ļ”We implemented an approach where relevant
concepts derived from computer science were
adapted to define legal requirements for data
bridges
ļ”āLegal interoperabilityā as an extension of the
general interoperability concept for diverse
systems
ļ”Data flow descriptions are as basis for the
specification of legal and ethical requirement
clusters
25
26. Legal Assessment Tool (LAT)
ļ”Usage of LAT
ļ”Answering a short survey
ļ”LAT assesses the given answers and shows the
legal texts and rules / regulations, which are
relevant for the researcherās situation
ļ”Availability:
http://hhu2.at.xencon.de/web/guest/assessmentto
ol
ļ”To provide guidance, a specific decision pathway
is followed
26
30. ļ” Demonstration of feasibility and usability of the security
architecture
ļ” Collaborative implementation of secure access
ļ” Based upon use case:
ā Researcher wants to obtain data about biosamples, researcher searches
for a biobank harbouring data about disease group x, containing at least
y samples of material type z
ļ” Pilot will implement a complete workflow for data search
and data sharing
ā Instantiating the security architecture with all access tiers
ā Enabling queries and data sharing in a secure, privacy preserving,
legally and ethically sound manner
Pilot for security framework
30
W. Kuchinke (2016)
31. ļ”Use of popular identity federation standard SAML
ļ”Open source Shibboleth is available for many platforms and
applications
ļ”Shibboleth offers simple methods to wrap areas of web
applications (e.g., via URL patterns)
ā
Therefore, before serving a web request, an unauthenticated
user can be forwarded to the login process
ā
User can select an identity provider (IdP)
ļ”After authentication, Shibboleth creates a user session using
identity attributes, which are sent back to the original requester
ļ”The tool acting as a service provider (SP) can check the
existence of a session and the associated attributes
Identity Management via Shibboleth
31
32. ļ”The workflow supports the research use case where a user is
looking for samples data
ļ”EBIās Biosamples Database is the starting point for the use case
ļ”Using LAT the researcher searching for specific data get legal /
ethical requirements to consider when accessing and sharing
human data
ļ”The BBMRI Hub provides summary information for the
requested dataset (makes the data findable)
ļ”It mediates the access to individual level data (e.g., anonymised
patient records)
ļ”The Hub checks that the requesting user has an associated
Shibboleth session
Pilot workflow process
32
33. ļ”If the user has no Shibboleth session, Shibboleth redirects the user
to a sign-on page
ļ”After successful authentication the request is sent back to the Hub,
where the user session is used to query REMS related attributes
ļ”These attributes contain the list of resource entitlements associated
to the user, such as for example database access rights
ļ”This information is obtained by coupling REMS with Shibbolethās
Attribute Authority component
ļ”If the user does not have the rights to access the requested data,
the user is forwarded to REMS, to apply for such access
ļ”Approved applications are notified to the user via email, which is
also used to send a link to the user to continue with accessing the
protected database
Pilot workflow process
33
34. Workflow of pilot implementation: Authorisation
after request for data access
34
35. Progress of pilot implementation
ļ”Implementation of SSO using Shibboleth
ļ”Extension of the BBMRI catalogue (BBMRI Hub)
ļ”Integration of REMS in order to support
authorization
Remaining tasks:
ļ”Integration of (mockup) biobanks
ļ”Integration of the LAT
ļ”Sustainability of developed tools and framework
35
W. Kuchinke (2016)
36. Support for additional security and
data protection issues
ļ”Data access agreements provide a balance between the need of
access to medical data data and ELSI demands
ļ”It is important that the process of granting access to data and
monitoring the access once it is approved rely on efficient and robust
tools
ļ”All tools must operate within a secure and ethical framework
ļ”One can leverage systems like REMS to improve the workflow of
data access approval processes
ļ”Possibility for the extension of data access approvals to clinical trials
ā
This can improve accountability and trust in the scientific
community
ā
May help overcoming obstacles for the sharing of clinical study
data
W. Kuchinke (2016)
38. Impact
ā User and stakeholder feedback of LAT
ā The tool is publicly available, comments and suggestions were
gathered online
ā Presentation of LAT
ā BMB AGM (Florence), GMDS (Gƶttingen), Knowledge Exchange
workshop (Berlin), RDA
ā Comment of evaluators: āThis is the tool we need!ā
ā Comparison of LAT with tools of other initiatives
ā BBMRI legal WIKI, hSERN, IPAC
ā These tools complement each other
ā Security framework
ā Feasibility and utility of the security framework is demonstrated
by a pilot implementation
ā Example for building secure data bridges
38
39. Impact of the developments
ā The most significant (and lasting) impact?
ā
Knowledge exchange workshop (30 June 2014, Berlin)
ā
Possibilities were created to use a software tool for
legal and ethical interoperability
ā Additional opportunities with an strategic impact?
ā
Collaboration with RDA / IPAC
ā
Extension of LAT knowledge base with national
legislation and rules would be necessary
ā
Incorporation of legal ontology into LAT
ā
Employment by BBMRI
ā
Employment by ECRIN
39
40. Sustainability of Legal Assessment
Tool
āMaintained and developed by ELIXIR, ECRIN
āEnabling legal and ethical interoperability when using / re-using
medical data for research purposes
āPlans for extension and sustainability exist
āRegular updates of the knowledge base
āEnlargement of ontology, development of a graphic wizard for
data entry, inclusion of examples from use cases
ā
But knowledge not in place to maintain LAT beyond the
project?
āSource code is openly available, on a request base
āDecision on suitable Open Source licence
40
41. Sustainability of framework of pilot
ļ” Security framework
ļ” Blueprint for secure, privacy preserving, ethically and legally sound
data bridges between research infrastructures
ļ” Can educate users about risks and risk mitigation
ļ” BBMRI catalogue of pilot
ļ” Maintained and developed in BBMRI
ļ” The code is not openly available
ļ” BioSamples Database
ļ” Maintained by ELIXIR
ļ” Necessary knowledge is in place to maintain these resources
beyond the project
ļ” The future maintenance of the other tools is uncertain
43. Lessons learned for the security and data
protection
ļ”Combination of Shibboleth and REMS limits efficiently
risks of the disclosure of confidential information to
unauthorised persons
ļ”REMS ensures that data access authorisation is
granted according to specific legal requirements
associated with a data sets (consideration of informed
consent given by the patient/ data provider)
ļ”The functionality of activity tracking allows that
evidence of compliance with regulations is controlled
and users and data managers can be held accountable
for their actions
43
44. Lessons learned from the pilot
ā What worked?
ā
Collaboration within BioMedBridges was essential
ā
Knowledge exchange with external stakeholders was decisive
ā
Developing an interactive legal / ethical Web2.0 tool
ā
Development of a complex security framework for tools
ā What did not work?
ā
Initiate collaboration with external stakeholders and research
infrastructures
ā
Maintenance and further development of the tool after the
projectās end (sustainability question)
ā
Development of a single and joint framework for secure, legally
and ethically compliant research by all research infrastructures
44
W. Kuchinke (2016)
45. Additional linked ethical and legal
governance tools
ļ”The Legal Assessment Tool (LAT) guides researchers with no legal knowledge
through relevant legal requirements for data sharing
ā LAT provides researchers with an online, interactive selection process to
characterise the involved types of data and databases and provides suitable
requirements and recommendations for data sharing
ā Links to the LAT were added to the BBMRI Hub
ā Guide data managers when assessing data sharing policies
ļ”The Human Sample Exchange Regulation Navigator (hSERN) is a web
resource featuring legal aspects involved in the exchanging human information
ļ”The BBMRI Legal Wiki can provide useful information when data has to be
exchanged between EU member countries
ļ”The International Policy interoperability and data Access Clearinghouse (IPAC)
is a tool providing information about policy interoperability on international level
ā It can provide direct help to the members of DACs who need to create forms
within REMS for users
45
W. Kuchinke (2016)
46. References
ā Legal Assessment Tool (LAT).
http://www.biomedbridges.eu/sharingsensitive-data
ā Human Sample Exchange Regulation Navigator
http://www.hsern.eu/
ā BBMRI Legal WIKI
http://www.bbmri-wp4.eu/wiki/index.php/Main_Page
ā IPAC
http://p3g.org/ipac
ā Brandizi, M., Melnichuk, O., Sarkans, U., Bild, R., Kohlmayer,
et.al. (2016, February 11). BioMedBridges: Implementation
of a pilot for the security framework. Zenodo.
http://doi.org/10.5281/zenodo.45927
ā
46
W. Kuchinke (2016)
47. References
ā Kuchinke W, Krauth C, Bergmann R, Karakoyun T, Woollard
A, Schluender I, Braasch B, Eckert M, Ohmann C. Legal
assessment tool (LAT): an interactive tool to address
privacy and data protection issues for data sharing. BMC
Med Inform Decis Mak. 2016 Jul 7;16(1):81
ā Brandizi M, Melnichuk O, Bild R, Kohlmayer F, Rodriguez-
Castro B, Spengler H, Kuhn KA, Kuchinke W, Ohmann C,
Mustonen T, Linden M, Nyrƶnen T, Lappalainen I, Brazma A,
Sarkans U. Orchestrating differential data access for
translational research: a pilot implementation. BMC Med
Inform Decis Mak. 2017 Mar 23;17(1):30
47
W. Kuchinke (2016)
48. Contributions
ā Part of the presentation was presented earlier by Wolfgang
Kuchinke, Raffael Bild, Florian Kohlmayer, Klaus A. Kuhn
48
W. Kuchinke (2016)