Security, privacy and trust: why and
how might we control access to
research data?
Paul Burton, Rebecca Wilson
University of Bristol, D2K Research Program
NISO Symposium, Denver
11th September, 2016
• Perhaps the most important message is in the
title:
• This is a complex challenge involving science,
technology, governance and other fundamental
social issues
• No single solution will be adequate
• True transdisciplinary programs of work are essential
• Even the most complex and sophisticated of
solutions will never offer fully effective exploitation
of available data with zero risk of mistakes in
managing data or of malign interference
Security, privacy and trust
From Preface:
“Our view has always been that
anonymisation is a heavily context-
dependent process and only by
considering the data and its
environment as a total system
(which we call the data situation),
can one come to a well informed
decision about whether and what
anonymisation is needed.”
Controlling access to
research data (security):
why and when?
• Who might share data?
• Distinct generator and user
• Share data across a consortium
• How is ‘sharing’ achieved?
• Physically transfer data to a user
• Provide access to analyse
• Analysis on-site
• Remote analysis
• Federated analysis
• All interpretations valid and important
What does “sharing” research microdata mean?
• Management of intellectual property invested
and held in data – most areas of research
• Legal, ethical and other governance stipulations
to protect the welfare of research participants –
particularly in health/social/biomedical research
• Disclosure of identity
• Disclosure of associated information
• Balance between these and the societal benefits of
streamlined comprehensive data access – which is
evolving rapidly with time and social context
Why control research data at all?
The Research
Data Pipeline
When are
data “at risk”?
The Research
Data Pipeline
When are
data “at risk”?
Dissemination
and evidence
based action
The Research
Data Pipeline
When are
data “at risk”?
Dissemination
and evidence
based action
• Risks associated with: storage; transmission; use
• Accidental v deliberate violations
• Direct v inferential disclosure
• Risks and remedies lie in the nature of the data
themselves and the contextual environment in
which the data are to be used – and potentially
misused
• Mark Elliot, Elaine Mackey, Keith Spicer, Caroline Tudor,
the Anonymisation Decision-Making Framework, 2016
Issues to consider
• Consider the user(s)
• Are he/she/they bona fide researchers?
• Consider the application – for example:
• Does it violate (or potentially violate) any of the
ethical permissions granted to the study or any
of the consent forms signed by the
participants or their guardians?
• Is there a risk it may produce information that
may allow individual cohort members to be
identified?
How to implement control in practice? UK
MetaDAC and ALSPAC as illustrative examples
• Administrative and research data held
separately
• Hard copies of data held in locked storage
• Electronic data held on password protected
systems with access restricted to those who
really need it
• All electronic data held in encrypted form
• Extensive QC (security of quality)
• Multiple back-ups (security of existence)
Managing the data and the data environment
• All data pseudonomised before release
• If pseudonomisation scientifically impossible,
data can be analysed ‘on site’
• All data transfers encrypted using standard
protocols
• All linked data released on study-specific IDs
• Explicit acknowledgment that no system can
guarantee a zero risk of disclosure or misuse of
data
Managing the data and the data environment
• A strong underpinning governance structure
is essential.
• EAGDA (Expert Advisory Group on Data
Access) report 2015 considered, amongst
many other key issues that:
EAGDA report, 2015
• Governance must be proportionate and
context appropriate
• Must be transparent, auditable and
appealable
• Need mutual trust and respect amongst
stakeholders
• Applicants for data through MetaDAC or from
ALSPAC sign up agreeing to governance
documents which include statements such as:
• Applicants are reminded that the Terms and Conditions for the
cohort explicitly forbid any attempt to identify individuals or to
compromise or otherwise infringe the confidentiality of
information on data subjects and their right to privacy.
• Do you understand that you must not pass on any data or
samples awarded, or any derived variables or genotypes
generated by this application to a third party (i.e. to anybody
that is not included in this list of applicants on this project, nor is
a direct employee of one of these applicants)?
How to implement control in practice? UK
MetaDAC and ALSPAC as illustrative examples
Role of encryption and other
technology-based forms of
privacy protection in “open
science”
• When research data are very sensitive or are seen as
having a particular intellectual property value can we
develop technology-based solutions that facilitate
access to microdata by enhancing privacy protection
so that all intellectual property and governance
constraints are met in full while lowering the
governance bar? This can promote open science by
easing and/or speeding up access requests
• Should be seen as an additional component to be
applied on top of a data access and governance
system that is already well founded
• EAGDA report emphasises sustainability
Privacy protection in “open science”
Data
Computer
DC
Analysis
Computer
AC
Single site DataSHIELD
2009: The DataSHIELD
challenge
Given that microdata are scientifically
critical and yet potentially sensitive,
can we ensure that the information
driving analysis of the data at each
centre only ever emerges from the
firewall in non-disclosive form? (i)
encryption (trivial and non-trivial);
(ii) low dimensional (ideally
sufficient) statistics
Multi-site DataSHIELD
horizontally partitioned data
• One step analyses: e.g.
ds.table2D - request
non-disclosive output
from all sources
• Multi-step analyses: e.g.
ds.lexis – set up and then
request
• Iterative analyses: e.g.
ds.glm - parallel
processes linked
together by non-
identifying summary
statistics – e.g. for glm
= score vectors and
information matrices
• Can be used as
equivalent to
full individual
level analysis or
to study level
meta-analysis
The DataSHIELD
solution
DataSHIELD
b.vector<-c(0,0,0,0)
glm(cc~1+BMI+BMI.456+SNP,
family=binomial,
start=b.vector, maxit=1)
Analysis commands (1)
Information Matrix Study 5
Score vector Study 5
Summary Statistics (1)
[36, 487.2951, 487.2951, 149]
Information Matrix Study 5
Score vector Study 5
Summary Statistics (1)
DataSHIELD
Σ Information Matrix Study 5
Score vector Study 5
Summary Statistics (1)
[36, 487.2951, 487.2951, 149]
Information Matrix Study 5
Score vector Study 5
Summary Statistics (1)
DataSHIELD
b.vector<-
c(-0.322, 0.0223, 0.0391, 0.535)
Analysis commands (2)
glm(cc~1+BMI+BMI.456+SNP,
family=binomial,
start=b.vector, maxit=1)
DataSHIELD
and so on .....
Updated parameters (4)
Σ
Coefficient Estimate Std Error
Intercept -0.3296 0.02838
BMI 0.02300 0.00621
BMI.456 0.04126 0.01140
SNP 0.5517 0.03295
Final parameter estimates
DataSHIELD
DataSHIELD analysis
Direct conventional analysis
Parameter Coefficient Standard Error
bintercept
-0.3296 0.02838
bBMI
0.02300 0.00621
bBMI.456
0.04126 0.01140
bSNP
0.5517 0.03295
Coefficients:
Estimate Std. Error
(Intercept) -0.32956 0.02838
BMI 0.02300 0.00621
BMI.456 0.04126 0.01140
SNP 0.55173 0.03295
Does it
work?
Server-side functions
Client-side functions
Individual level data never
transmitted or seen by the
statistician in charge, or by
anybody outside the
original centre in which
they are stored.
R
R
R R
Web services
Web servicesWeb services
Data server
Opal
Finrisk
Opal
Prevend
Opal
1958BC
Data server
Data server
BioSHaRE
web site
Web services
Analysis
client
DataSHIELD: current implementation for
horizontally partitioned data
IM5:
Analysis
Computer
R
R
R R
Web services
Web servicesWeb services
Data computer Opal
NHS
Opal
ALSPAC
Opal
Education
Data computer Data computer
Regression coefficients = XTY/ XTX
XTX: Need to calculate
XAXA XAXB XAXC
XAXB XBXB XBXC
XAXC XBXC XCXC
XA
XB
XAXB
XA1 * XB1
+
XA2 * XB2
+
XA3 * XB3
+
……
DataSHIELD: current implementation for
vertically partitioned (linked) data
IM5:
Analysis
Computer
R
R
R R
Web services
Web servicesWeb services
Data computer Opal
NHS
Opal
ALSPAC
Opal
Education
Data computer Data computer
Regression coefficients = XTY/ XTX
XTX: Need to calculate
XAXA XAXB XAXC
XAXB XBXB XBXC
XAXC XBXC XCXC
MA
MB
MC
XA
XB
XAXB
XA1 * XB1
+
XA2 * XB2
+
XA3 * XB3
+
……
DataSHIELD:current
implementationfor vertically
partitioned(linked) data
plain.text.vector.A plain.text.vector.N
0 1 1 1 0 0 1 1 1 0 1 0 0 1
encryption.matrix
[,1] [,2] [,3]
[1,] -1.444769 2.495677 -5.322736
[2,] -1.355529 -9.369041 2.687347
[3,] 4.603762 -3.622044 -2.817478
occluded.matrix.A
[,1] [,2] [,3]
[1,] -1.4546711 0 4.0722205
[2,] 6.4809785 1 -4.5814726
[3,] 4.4954801 1 -8.7036260
[4,] 0.1995684 1 -8.6872205
[5,] -6.4060220 0 -6.6471777
[6,] -0.5164345 0 -0.2564673
[7,] -5.8981933 1 -8.5032852
IM5:
Analysis
Computer
R
R
R R
Web services
Opal
NHS
Opal
ALSPAC
Opal
Education
Data computer Data computer
Regression coefficients = XTY/ XTX
XTX: Need to calculate
XAXA XAXB XAXC
XAXB XBXB XBXC
XAXC XBXC XCXC
MA XT
A
MAXT
AXBMB
MA
XA
XB
MB
(MA)-1 MAXT
AXBMB (MB)-1 = XAXB
DataSHIELD:current
implementationfor vertically
partitioned(linked) data
plain.text.vector.A plain.text.vector.N
0 1 1 1 0 0 1 1 1 0 1 0 0 1
encryption.matrix
[,1] [,2] [,3]
[1,] -1.444769 2.495677 -5.322736
[2,] -1.355529 -9.369041 2.687347
[3,] 4.603762 -3.622044 -2.817478
occluded.matrix.A
[,1] [,2] [,3]
[1,] -1.4546711 0 4.0722205
[2,] 6.4809785 1 -4.5814726
[3,] 4.4954801 1 -8.7036260
[4,] 0.1995684 1 -8.6872205
[5,] -6.4060220 0 -6.6471777
[6,] -0.5164345 0 -0.2564673
[7,] -5.8981933 1 -8.5032852
The core DataSHIELD
Development Team
Becca Wilson
If people want to know technical details about DataSHIELD or methods for secure data
sharing/analysis: see 12th September 11:30 - 13:00 Secure Multiparty Computation for Statistical
Analysis of Private Data
Demetris Avraam
Poster #12 RDA poster session Wednesday, Thursday, Friday – DataSHIELD: a method for privacy
protected analysis of individual level data
RDA Working Group for Data Security and Trust
RDA 8th Plenary in the same venue as the NISO symposium. Session on 17th September 11:00 -
12:30. We are running a survey to gather information on current data security practices in our
community. The survey is available at www.bit.ly/dash-ing (see below)
Data to Knowledge (D2K) Research Group. Contact details:
@Data2Knowledge there is also a "contact us" page on www.datashield.ac.uk
Setting up a professional community for stakeholders in health data sharing
DASH-ING: DAta Sharing for Health - INnovation Group
The website will initially (and temporarily) be at: www.bit.ly/dash-ing This webpage contains
questions for RDA survey and link to join the professional community
Additional opportunities for interaction
THANK YOU FOR LISTENING
> plain.text.vector.L
[1] 0 1 1 1 0 0 1
> plain.text.vector.N
[1] 1 1 0 1 0 0 1
> sum(plain.text.vector.L*plain.text.matrix.N)
[1] 3
>t(matrix(plain.text.vector.L))%*%matrix(plain.text.vector.N)
[,1]
[1,] 3
How does matrix-based encryption
work?
> occluded.matrix.L > plain.text.vector.L
[,1] [,2] [,3] [1] 0 1 1 1 0 0 1
[1,] -1.4546711 0 4.0722205
[2,] 6.4809785 1 -4.5814726
[3,] 4.4954801 1 -8.7036260
[4,] 0.1995684 1 -8.6872205
[5,] -6.4060220 0 -6.6471777
[6,] -0.5164345 0 -0.2564673
[7,] -5.8981933 1 -8.5032852
> e.mat.L
[,1] [,2] [,3]
[1,] -1.444769 2.495677 -5.322736
[2,] -1.355529 -9.369041 2.687347
[3,] 4.603762 -3.622044 -2.817478
> e.mat.L%*%occluded.matrix.L
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] -19.57369 17.51813 42.32785 48.44713 44.636397 2.11123627 56.277949
[2,] 12.91532 -30.46620 -38.85246 -32.98514 -9.179719 0.01082581 -24.225142
[3,] -18.17035 39.12303 41.59635 21.77277 -10.763524 -1.65495077 -6.818104
How does matrix-based encryption
work?
> e.mat.L%*%occluded.matrix.L
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] -19.57369 17.51813 42.32785 48.44713 44.636397 2.11123627 56.277949
[2,] 12.91532 -30.46620 -38.85246 -32.98514 -9.179719 0.01082581 -24.225142
[3,] -18.17035 39.12303 41.59635 21.77277 -10.763524 -1.65495077 -6.818104
> plain.text.vector.N
[1] 1 1 0 1 0 0 1
> e.mat.L%*%occluded.matrix.L%*% plain.text.matrix.N
[,1]
[1,] 102.66952
[2,] -74.76116
[3,] 35.90735
> inv.e.mat.L%*%e.mat.L%*%occluded.matrix.L%*% plain.text.matrix.N
[,1]
[1,] -0.6723174
[2,] 3.0000000
[3,] -17.6997578
> sum(plain.text.vector.L*plain.text.matrix.N)
[1] 3
How does matrix-based encryption
work?
> plain.text.vector.L
[1] 0 1 1 1 0 0 1
> e.mat.1
[,1]
[1,] 7.13763
> e.mat.1%*%t(matrix(plain.text.vector.L))
[,1] [,2] [,3] [,4] [,5] [,6][,7]
[1,] 0 7.13763 7.13763 7.13763 0 0 7.13763
>e.mat.1%*%t(matrix(plain.text.vector.L))%*%plain.text.matrix.N
[,1]
[1,] 21.41289
>(1/e.mat.1)*e.mat.1%*%t(matrix(plain.text.vector.L))%*%plain.tex
t.matrix.N
[,1]
[1,] 3
Why do we need to occlude the
original plain text vector?
• Is there a significant risk of upsetting or
alienating cohort members or of reducing their
willingness to remain as active participants?
• Does it address topics that fall within the
acknowledged scientific remit of the cohort?
• Is access requested to an infinite resource (data
or cell line DNA) or a depletable resource. If
non-depletable NO assessment of the science
underpinning the application
UK MetaDAC and ALSPAC as illustrative examples
• Wish to control access
• Undesirable loss of intellectual property
• Violation of legal, ethical and other governance
stipulations – particularly disclosure of identity
and/or associated information
• Wish to ensure data used widely for scientific
research and that access procedures are
streamlined
• Data and contextual “data environment” both
crucial and control systems are typically
complex and multi-faceted
How should access be controlled?

Burton - Security, Privacy and Trust

  • 1.
    Security, privacy andtrust: why and how might we control access to research data? Paul Burton, Rebecca Wilson University of Bristol, D2K Research Program NISO Symposium, Denver 11th September, 2016
  • 2.
    • Perhaps themost important message is in the title: • This is a complex challenge involving science, technology, governance and other fundamental social issues • No single solution will be adequate • True transdisciplinary programs of work are essential • Even the most complex and sophisticated of solutions will never offer fully effective exploitation of available data with zero risk of mistakes in managing data or of malign interference Security, privacy and trust
  • 3.
    From Preface: “Our viewhas always been that anonymisation is a heavily context- dependent process and only by considering the data and its environment as a total system (which we call the data situation), can one come to a well informed decision about whether and what anonymisation is needed.”
  • 4.
    Controlling access to researchdata (security): why and when?
  • 5.
    • Who mightshare data? • Distinct generator and user • Share data across a consortium • How is ‘sharing’ achieved? • Physically transfer data to a user • Provide access to analyse • Analysis on-site • Remote analysis • Federated analysis • All interpretations valid and important What does “sharing” research microdata mean?
  • 6.
    • Management ofintellectual property invested and held in data – most areas of research • Legal, ethical and other governance stipulations to protect the welfare of research participants – particularly in health/social/biomedical research • Disclosure of identity • Disclosure of associated information • Balance between these and the societal benefits of streamlined comprehensive data access – which is evolving rapidly with time and social context Why control research data at all?
  • 7.
    The Research Data Pipeline Whenare data “at risk”?
  • 8.
    The Research Data Pipeline Whenare data “at risk”? Dissemination and evidence based action
  • 9.
    The Research Data Pipeline Whenare data “at risk”? Dissemination and evidence based action
  • 10.
    • Risks associatedwith: storage; transmission; use • Accidental v deliberate violations • Direct v inferential disclosure • Risks and remedies lie in the nature of the data themselves and the contextual environment in which the data are to be used – and potentially misused • Mark Elliot, Elaine Mackey, Keith Spicer, Caroline Tudor, the Anonymisation Decision-Making Framework, 2016 Issues to consider
  • 11.
    • Consider theuser(s) • Are he/she/they bona fide researchers? • Consider the application – for example: • Does it violate (or potentially violate) any of the ethical permissions granted to the study or any of the consent forms signed by the participants or their guardians? • Is there a risk it may produce information that may allow individual cohort members to be identified? How to implement control in practice? UK MetaDAC and ALSPAC as illustrative examples
  • 12.
    • Administrative andresearch data held separately • Hard copies of data held in locked storage • Electronic data held on password protected systems with access restricted to those who really need it • All electronic data held in encrypted form • Extensive QC (security of quality) • Multiple back-ups (security of existence) Managing the data and the data environment
  • 13.
    • All datapseudonomised before release • If pseudonomisation scientifically impossible, data can be analysed ‘on site’ • All data transfers encrypted using standard protocols • All linked data released on study-specific IDs • Explicit acknowledgment that no system can guarantee a zero risk of disclosure or misuse of data Managing the data and the data environment
  • 14.
    • A strongunderpinning governance structure is essential. • EAGDA (Expert Advisory Group on Data Access) report 2015 considered, amongst many other key issues that: EAGDA report, 2015 • Governance must be proportionate and context appropriate • Must be transparent, auditable and appealable • Need mutual trust and respect amongst stakeholders
  • 15.
    • Applicants fordata through MetaDAC or from ALSPAC sign up agreeing to governance documents which include statements such as: • Applicants are reminded that the Terms and Conditions for the cohort explicitly forbid any attempt to identify individuals or to compromise or otherwise infringe the confidentiality of information on data subjects and their right to privacy. • Do you understand that you must not pass on any data or samples awarded, or any derived variables or genotypes generated by this application to a third party (i.e. to anybody that is not included in this list of applicants on this project, nor is a direct employee of one of these applicants)? How to implement control in practice? UK MetaDAC and ALSPAC as illustrative examples
  • 16.
    Role of encryptionand other technology-based forms of privacy protection in “open science”
  • 17.
    • When researchdata are very sensitive or are seen as having a particular intellectual property value can we develop technology-based solutions that facilitate access to microdata by enhancing privacy protection so that all intellectual property and governance constraints are met in full while lowering the governance bar? This can promote open science by easing and/or speeding up access requests • Should be seen as an additional component to be applied on top of a data access and governance system that is already well founded • EAGDA report emphasises sustainability Privacy protection in “open science”
  • 18.
    Data Computer DC Analysis Computer AC Single site DataSHIELD 2009:The DataSHIELD challenge Given that microdata are scientifically critical and yet potentially sensitive, can we ensure that the information driving analysis of the data at each centre only ever emerges from the firewall in non-disclosive form? (i) encryption (trivial and non-trivial); (ii) low dimensional (ideally sufficient) statistics Multi-site DataSHIELD horizontally partitioned data
  • 19.
    • One stepanalyses: e.g. ds.table2D - request non-disclosive output from all sources • Multi-step analyses: e.g. ds.lexis – set up and then request • Iterative analyses: e.g. ds.glm - parallel processes linked together by non- identifying summary statistics – e.g. for glm = score vectors and information matrices • Can be used as equivalent to full individual level analysis or to study level meta-analysis The DataSHIELD solution
  • 20.
  • 21.
    Information Matrix Study5 Score vector Study 5 Summary Statistics (1) [36, 487.2951, 487.2951, 149] Information Matrix Study 5 Score vector Study 5 Summary Statistics (1) DataSHIELD
  • 22.
    Σ Information MatrixStudy 5 Score vector Study 5 Summary Statistics (1) [36, 487.2951, 487.2951, 149] Information Matrix Study 5 Score vector Study 5 Summary Statistics (1) DataSHIELD
  • 23.
    b.vector<- c(-0.322, 0.0223, 0.0391,0.535) Analysis commands (2) glm(cc~1+BMI+BMI.456+SNP, family=binomial, start=b.vector, maxit=1) DataSHIELD
  • 24.
    and so on.....
  • 25.
    Updated parameters (4) Σ CoefficientEstimate Std Error Intercept -0.3296 0.02838 BMI 0.02300 0.00621 BMI.456 0.04126 0.01140 SNP 0.5517 0.03295 Final parameter estimates DataSHIELD
  • 26.
    DataSHIELD analysis Direct conventionalanalysis Parameter Coefficient Standard Error bintercept -0.3296 0.02838 bBMI 0.02300 0.00621 bBMI.456 0.04126 0.01140 bSNP 0.5517 0.03295 Coefficients: Estimate Std. Error (Intercept) -0.32956 0.02838 BMI 0.02300 0.00621 BMI.456 0.04126 0.01140 SNP 0.55173 0.03295 Does it work?
  • 27.
    Server-side functions Client-side functions Individuallevel data never transmitted or seen by the statistician in charge, or by anybody outside the original centre in which they are stored. R R R R Web services Web servicesWeb services Data server Opal Finrisk Opal Prevend Opal 1958BC Data server Data server BioSHaRE web site Web services Analysis client DataSHIELD: current implementation for horizontally partitioned data
  • 28.
    IM5: Analysis Computer R R R R Web services WebservicesWeb services Data computer Opal NHS Opal ALSPAC Opal Education Data computer Data computer Regression coefficients = XTY/ XTX XTX: Need to calculate XAXA XAXB XAXC XAXB XBXB XBXC XAXC XBXC XCXC XA XB XAXB XA1 * XB1 + XA2 * XB2 + XA3 * XB3 + …… DataSHIELD: current implementation for vertically partitioned (linked) data
  • 29.
    IM5: Analysis Computer R R R R Web services WebservicesWeb services Data computer Opal NHS Opal ALSPAC Opal Education Data computer Data computer Regression coefficients = XTY/ XTX XTX: Need to calculate XAXA XAXB XAXC XAXB XBXB XBXC XAXC XBXC XCXC MA MB MC XA XB XAXB XA1 * XB1 + XA2 * XB2 + XA3 * XB3 + …… DataSHIELD:current implementationfor vertically partitioned(linked) data plain.text.vector.A plain.text.vector.N 0 1 1 1 0 0 1 1 1 0 1 0 0 1 encryption.matrix [,1] [,2] [,3] [1,] -1.444769 2.495677 -5.322736 [2,] -1.355529 -9.369041 2.687347 [3,] 4.603762 -3.622044 -2.817478 occluded.matrix.A [,1] [,2] [,3] [1,] -1.4546711 0 4.0722205 [2,] 6.4809785 1 -4.5814726 [3,] 4.4954801 1 -8.7036260 [4,] 0.1995684 1 -8.6872205 [5,] -6.4060220 0 -6.6471777 [6,] -0.5164345 0 -0.2564673 [7,] -5.8981933 1 -8.5032852
  • 30.
    IM5: Analysis Computer R R R R Web services Opal NHS Opal ALSPAC Opal Education Datacomputer Data computer Regression coefficients = XTY/ XTX XTX: Need to calculate XAXA XAXB XAXC XAXB XBXB XBXC XAXC XBXC XCXC MA XT A MAXT AXBMB MA XA XB MB (MA)-1 MAXT AXBMB (MB)-1 = XAXB DataSHIELD:current implementationfor vertically partitioned(linked) data plain.text.vector.A plain.text.vector.N 0 1 1 1 0 0 1 1 1 0 1 0 0 1 encryption.matrix [,1] [,2] [,3] [1,] -1.444769 2.495677 -5.322736 [2,] -1.355529 -9.369041 2.687347 [3,] 4.603762 -3.622044 -2.817478 occluded.matrix.A [,1] [,2] [,3] [1,] -1.4546711 0 4.0722205 [2,] 6.4809785 1 -4.5814726 [3,] 4.4954801 1 -8.7036260 [4,] 0.1995684 1 -8.6872205 [5,] -6.4060220 0 -6.6471777 [6,] -0.5164345 0 -0.2564673 [7,] -5.8981933 1 -8.5032852
  • 31.
  • 32.
    Becca Wilson If peoplewant to know technical details about DataSHIELD or methods for secure data sharing/analysis: see 12th September 11:30 - 13:00 Secure Multiparty Computation for Statistical Analysis of Private Data Demetris Avraam Poster #12 RDA poster session Wednesday, Thursday, Friday – DataSHIELD: a method for privacy protected analysis of individual level data RDA Working Group for Data Security and Trust RDA 8th Plenary in the same venue as the NISO symposium. Session on 17th September 11:00 - 12:30. We are running a survey to gather information on current data security practices in our community. The survey is available at www.bit.ly/dash-ing (see below) Data to Knowledge (D2K) Research Group. Contact details: @Data2Knowledge there is also a "contact us" page on www.datashield.ac.uk Setting up a professional community for stakeholders in health data sharing DASH-ING: DAta Sharing for Health - INnovation Group The website will initially (and temporarily) be at: www.bit.ly/dash-ing This webpage contains questions for RDA survey and link to join the professional community Additional opportunities for interaction
  • 33.
    THANK YOU FORLISTENING
  • 34.
    > plain.text.vector.L [1] 01 1 1 0 0 1 > plain.text.vector.N [1] 1 1 0 1 0 0 1 > sum(plain.text.vector.L*plain.text.matrix.N) [1] 3 >t(matrix(plain.text.vector.L))%*%matrix(plain.text.vector.N) [,1] [1,] 3 How does matrix-based encryption work?
  • 35.
    > occluded.matrix.L >plain.text.vector.L [,1] [,2] [,3] [1] 0 1 1 1 0 0 1 [1,] -1.4546711 0 4.0722205 [2,] 6.4809785 1 -4.5814726 [3,] 4.4954801 1 -8.7036260 [4,] 0.1995684 1 -8.6872205 [5,] -6.4060220 0 -6.6471777 [6,] -0.5164345 0 -0.2564673 [7,] -5.8981933 1 -8.5032852 > e.mat.L [,1] [,2] [,3] [1,] -1.444769 2.495677 -5.322736 [2,] -1.355529 -9.369041 2.687347 [3,] 4.603762 -3.622044 -2.817478 > e.mat.L%*%occluded.matrix.L [,1] [,2] [,3] [,4] [,5] [,6] [,7] [1,] -19.57369 17.51813 42.32785 48.44713 44.636397 2.11123627 56.277949 [2,] 12.91532 -30.46620 -38.85246 -32.98514 -9.179719 0.01082581 -24.225142 [3,] -18.17035 39.12303 41.59635 21.77277 -10.763524 -1.65495077 -6.818104 How does matrix-based encryption work?
  • 36.
    > e.mat.L%*%occluded.matrix.L [,1] [,2][,3] [,4] [,5] [,6] [,7] [1,] -19.57369 17.51813 42.32785 48.44713 44.636397 2.11123627 56.277949 [2,] 12.91532 -30.46620 -38.85246 -32.98514 -9.179719 0.01082581 -24.225142 [3,] -18.17035 39.12303 41.59635 21.77277 -10.763524 -1.65495077 -6.818104 > plain.text.vector.N [1] 1 1 0 1 0 0 1 > e.mat.L%*%occluded.matrix.L%*% plain.text.matrix.N [,1] [1,] 102.66952 [2,] -74.76116 [3,] 35.90735 > inv.e.mat.L%*%e.mat.L%*%occluded.matrix.L%*% plain.text.matrix.N [,1] [1,] -0.6723174 [2,] 3.0000000 [3,] -17.6997578 > sum(plain.text.vector.L*plain.text.matrix.N) [1] 3 How does matrix-based encryption work?
  • 37.
    > plain.text.vector.L [1] 01 1 1 0 0 1 > e.mat.1 [,1] [1,] 7.13763 > e.mat.1%*%t(matrix(plain.text.vector.L)) [,1] [,2] [,3] [,4] [,5] [,6][,7] [1,] 0 7.13763 7.13763 7.13763 0 0 7.13763 >e.mat.1%*%t(matrix(plain.text.vector.L))%*%plain.text.matrix.N [,1] [1,] 21.41289 >(1/e.mat.1)*e.mat.1%*%t(matrix(plain.text.vector.L))%*%plain.tex t.matrix.N [,1] [1,] 3 Why do we need to occlude the original plain text vector?
  • 38.
    • Is therea significant risk of upsetting or alienating cohort members or of reducing their willingness to remain as active participants? • Does it address topics that fall within the acknowledged scientific remit of the cohort? • Is access requested to an infinite resource (data or cell line DNA) or a depletable resource. If non-depletable NO assessment of the science underpinning the application UK MetaDAC and ALSPAC as illustrative examples
  • 39.
    • Wish tocontrol access • Undesirable loss of intellectual property • Violation of legal, ethical and other governance stipulations – particularly disclosure of identity and/or associated information • Wish to ensure data used widely for scientific research and that access procedures are streamlined • Data and contextual “data environment” both crucial and control systems are typically complex and multi-faceted How should access be controlled?