Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Provenance abstraction for
implementing security policies
Learning Health System and securing provenance of
health data
Dr...
Overview
• Learning Health System
• LHS requirements for provenance data
• TRANSFoRm project
• Transformation-oriented Acc...
Learning Health System
“ ... one in which progress in science,
informatics, and care culture align to generate
new knowled...
2!
2
A Learning Health System
for the Nation
Pharmaceutical Firm
Beacon
Community
Integrated
Delivery
System
Community
Pra...
Learning Health System take-up
• US medical/academic centres
o Mayo, Duke, Vanderbilt
o PCORI
• National data aggregators
...
Example: Clinical trial challenges
• Major motivation for the LHS work
• Trials too expensive and difficult to run
• Effic...
LHS for Clinical Trials
• EHR integration
o Eligibility checking done automatically from EHR data
o eCRFs partially filled...
Trust in the LHS
• Research community is struggling to ensure transparency
and correctness of published research
• Reasons...
Trust in the LHS (cont.)
• The problem is by no means restricted to preclinical studies
• Twelve randomised clinical trial...
• Each component in the
healthcare system
produces and consumes
data:
• Epidemiological research
using record linkages
• R...
TRANSFoRm project
• €7.5M European Commission 2010-2015
• Funded under the Patient Safety Work Program of FP7
• Developing...
Middleware
Secure data
transport
RCT tools
(Electronic Data
Collection)
Epidemiological
study tools
(Data queries)
Authent...
Use case 1: Type 2 Diabetes
• Research Question: In type 2 diabetic
patients, are selected single nucleotide
polymorphisms...
Use case 2: Gastro-oesophageal reflux disease (GORD)
• Research Question: What gives the best symptom relief
and improveme...
Use case 3: Diagnostic Decision Support
• Early diagnostic suggestions for presenting problems:
• chest pain
• abdominal p...
Provenance challenge for TRANSFoRm
• Viable methods for adoption in a heterogeneous
software environment
o No shared workf...
Semantic annotations
• Semantic concepts in the provenance graph defined using
TRANSFoRm ontologies:
o Clinical Research I...
Provenance templates
Provenance database
Provenance server
Existing
tools
1. Tools are agnostic to provenance
representati...
Example: Provenance of diagnostic recommendation
Provenance security
• Use a single provenance graph for:
o Full trial audit
o Reporting studies
o Publication review
o Col...
Basic idea
• The aim of an access control strategy is not only to
determine if the resource can be viewed or not, but
to c...
Access control
• Ensuring that a principal (person, process, etc.) can
only access the services or data in a system that
t...
Access control
• Two classical approaches:
o Closed policy
• deny-by-default
• Access to a resource is only granted if a c...
Access control languages for provenance
• Qin Ni et al
o Semantic description of subjects (user roles) and resources
to be...
Indirect relations
• Introduce some new relations to be used in
abstraction
External effects and causes
• External effects and causes of the set of nodes S
w.r.t. a set of nodes R
o Set of nodes tha...
External effects and causes
Basic operations
• Node removal
o Subgraph needs to be hidden
o e.g. if it is unnecessary for an analysis or user access t...
Operation: node removal
• Let Prov = (V , E , type) and R ⊆ V be a set of nodes to be
removed. Result is a new provenance ...
Operation: node replacement
• As before, with operation AR replacing node set R
with node va
Abstract nodes and edges
• Dummy nodes introduced during entity
replacement
• Preserve the causality of the rest of the gr...
Removal and Replacement
Replace (A,B)
Remove (A,B)
Removal and Replacement
Replace (A,B)
Remove (A,B)
False dependencies
• False dependencies introduce a previously non-
existent path in the new graph, e.g. removing A, B
Causality preserving transformation
• A transformation is called causality preserving if it
does not introduce false depen...
Causality preserving partition and transformation
• Given a set of nodes R ⊆ V, a causality preserving
partition ℘ of R is...
Optimal causality preserving partition
• Default partition of R consists of singletons, i.e.
each node in R is a set in th...
Provenance graph transformation algorithm
• Once the partition is computed, the
transformations are iteratively applied to...
Computational efficiency
• Transformation algorithm performance depends on
the performance of the partition algorithm
• Th...
Experimental results
• Provenance view transformation algorithm was
implemented in Python 2.7 using Networkx API.
• Experi...
Performance behaviour
• Execution time (Y) in seconds as a function of the
number of nodes (X) and the percentage of nodes...
Use case: Access to health data
• Access control for the provenance data collected from an
Electronic Health Record (EHR) ...
TACLP
• Transformation-oriented Access Control Language
for Provenance (TACLP)
• Extends the works of Ni and Cadenhead by
...
TACLP Target
• Subject element
o Set of users (subject element) to which the policy should be
applied, expressed through I...
TACLP Effect
• Specifies the intended outcome
• Four possibilities:
o Absolute permit guarantees access to the graph regar...
TACLP Transformation
• How to transform the provenance graph in order to
hide certain resources
• Specification of which n...
TACLP Transformation
• Abstraction level
o Hide
• matched nodes of the subgraph have to be completely hidden
(removed) fro...
Access control evaluation algorithms
• Aim to produce an abstracted graph that satisfies
the constraints
• Deny-takes-prec...
Example: Source provenance graph
Example: Abstracted provenance graph
Summary
• Learning Health System presenting new set of
challenges for medical and informatics communities
• Provenance can...
Acknowledgements
• Thanks to:
o Roxana Danger
o Paolo Missier
o Jeremy Bryant
o Derek Corrigan
o Brendan Delaney
Questions?
Thank you!
Upcoming SlideShare
Loading in …5
×

Provenance abstraction for implementing security: Learning Health System and securing provenance of health data

268 views

Published on

Discussion of provenance usage in the Learning Health System paradigm, as implemented in the TRANSFoRm project, with focus on security requirements and how they can be addressed using provenance graph abstraction.

Published in: Software
  • Be the first to comment

  • Be the first to like this

Provenance abstraction for implementing security: Learning Health System and securing provenance of health data

  1. 1. Provenance abstraction for implementing security policies Learning Health System and securing provenance of health data Dr Vasa Curcin King’s College London
  2. 2. Overview • Learning Health System • LHS requirements for provenance data • TRANSFoRm project • Transformation-oriented Access Control Language for Provenance (TACLP)
  3. 3. Learning Health System “ ... one in which progress in science, informatics, and care culture align to generate new knowledge as an ongoing, natural by- product of the care experience, and seamlessly refine and deliver best practices for continuous improvement in health and health care.” (Institute of Medicine) We can’t afford to waste data!
  4. 4. 2! 2 A Learning Health System for the Nation Pharmaceutical Firm Beacon Community Integrated Delivery System Community Practice Health'Informa. on'Organiza. on' Health Center Network Federal Agencies State Public Health Governance Patient Engagement Trust Analysis Dissemination Learning Health System Defining functions of a LHS are to: 1.routinely and securely aggregate data from disparate sources 2.convert the data to knowledge 3.disseminate that knowledge, in actionable forms, to everyone who can benefit from it. c/o C. Friedman
  5. 5. Learning Health System take-up • US medical/academic centres o Mayo, Duke, Vanderbilt o PCORI • National data aggregators o Clinical Practice Research Datalink o NIVEL • EHR vendors o CSC, Asseco, TPP, InPractice Systems • European academic-industrial collaborations o TRANSFoRm, EHR4CR, Semantic HealthNet …and Bill
  6. 6. Example: Clinical trial challenges • Major motivation for the LHS work • Trials too expensive and difficult to run • Efficacy-effectiveness gap (EEG) o Disconnect between outcomes from clinical trials and information needed for clinical practice o Interaction of drug effect and real-life contextual factors o Challenge to identify contextual factors • LHS provides context and workflow
  7. 7. LHS for Clinical Trials • EHR integration o Eligibility checking done automatically from EHR data o eCRFs partially filled based on EHR information o All collected data stored in the EHR system as well as the research database • Closing the loop o eCRF data enriches the EHR o Helps the clinician o Adds value to the EHR system • Data does not go to waste! 7
  8. 8. Trust in the LHS • Research community is struggling to ensure transparency and correctness of published research • Reasons complex and interleaving (positive bias, intractable analysis, deluge of journals) • Bayer Healthcare team published work showing that only 25% of the academic studies they examined could be replicated o Prinz et al. Nat. Rev. Drug Discov. 10, 712, 2011 • Of 53 oncology studies from 2001-2011, each highlighting big new apparent advances in the field, only 11% (6) could be robustly replicated. o Begley & Ellis Nature 483, 531–533, 2012
  9. 9. Trust in the LHS (cont.) • The problem is by no means restricted to preclinical studies • Twelve randomised clinical trials testing 52 observational claims and failed to reproduce a single one o Young SS, Karr A. Deming, data and observational studies. Significance sep 2011; 8(3):116–120 • Replication of 100 experiments published in 2008 in three high- ranking psychology journals – less than one half of finding replicated o Estimating the reproducibility of psychological science. Science Aug 2015;349(6251) • Random sample of 441 biomedical journal articles 2000 – 2014: none made all their data available, one provided full protocol, majority did not disclose funding or conflicts of interest o Iqbal et al. Reproducible Research Practices and Transparency across the Biomedical Literature. PLoS biology 2016; 14(1) • Cost of irreproducible research in life science is estimated at $28 billion per year in the U.S o Freedman LP, Cockburn IM, Simcoe TS. The Economics of Reproducibility in Preclinical Research. PLOS Biology jun 2015; 13(6)
  10. 10. • Each component in the healthcare system produces and consumes data: • Epidemiological research using record linkages • Research data embedded in the EHR • Decision support for diagnosis • Provenance infrastructure required to support all these domains Data in the Learning Health System Specific research data Actionable data Routinely collected data • Clinical trials • Controlled populations • Well-defined questions • EHR systems • Wide coverage • Vast quantity • May lack in detail and quality • Distilled scientific findings • Usable in clinical practice • Decision support
  11. 11. TRANSFoRm project • €7.5M European Commission 2010-2015 • Funded under the Patient Safety Work Program of FP7 • Developing methods, models, services, validated architectures and demonstrations to support: o Epidemiological research using GP records, including genotype- phenotype studies and other record linkages o Clinical trials embedded in the EHR o Decision support for diagnosis www.transformproject.eu
  12. 12. Middleware Secure data transport RCT tools (Electronic Data Collection) Epidemiological study tools (Data queries) Authentication framework Diagnostic support tools Data source connectivity module Provenance framework Vocabulary service TRANSFoRm software landscape
  13. 13. Use case 1: Type 2 Diabetes • Research Question: In type 2 diabetic patients, are selected single nucleotide polymorphisms (SNPs) associated with variations in drug response to oral antidiabetic drugs (Sulfonylurea)? • Design: Case-control study • Data: primary care databases (phenotype data) pre-linked to genomic databases (genetic risk factors) – data federation
  14. 14. Use case 2: Gastro-oesophageal reflux disease (GORD) • Research Question: What gives the best symptom relief and improvement in Quality of Life: continuous or on demand Proton Pump Inhibitor use? • Design: Randomised Controlled Trial (RCT) • Data: Collection through EHR & web based questionnaire – electronic case report forms AND mobile Patient Related Outcome Measures • Provenance and security
  15. 15. Use case 3: Diagnostic Decision Support • Early diagnostic suggestions for presenting problems: • chest pain • abdominal pain • shortness of breath • Clinical Prediction Rule web service (with underlying ontology) • Prototype Decision Support System integrated with a commercial electronic health record system • Vision by InPractice Systems
  16. 16. Provenance challenge for TRANSFoRm • Viable methods for adoption in a heterogeneous software environment o No shared workflow middleware to rely on • Need to achieve domain specificity • Able to demonstrate conformance to standards o Title 21 of the Code of Federal Regulations; Electronic Records; Electronic Signatures (21 CFR Part 11) o Good Clinical Practice (GCP) o EudraLex Vol. 4 Annex 11: Computerised Systems in EU o CONSORT, STROBE, RECORD
  17. 17. Semantic annotations • Semantic concepts in the provenance graph defined using TRANSFoRm ontologies: o Clinical Research Information Model (CRIM) o Software infrastructure ontology o Clinical evidence ontology • Ontology concepts annotations on provenance nodes • Provenance templates define domain actions that map to provenance fragments PCROM (UML Model) Randomised Clinical Trial Ontology (RCTO) Randomised Clinical Trial Provenance Ontology (RCTPO)
  18. 18. Provenance templates Provenance database Provenance server Existing tools 1. Tools are agnostic to provenance representation 2. Service invocation matches some provenance template in Provenance server 3. Template is instantiated into a provenance graph fragment with OWL concept annotations 4. Graphs merged inside the database API service calls OPM graphs annotated with OWL
  19. 19. Example: Provenance of diagnostic recommendation
  20. 20. Provenance security • Use a single provenance graph for: o Full trial audit o Reporting studies o Publication review o Collaborators o Readers • Need to abstract parts of the graph • Access control and view generation for provenance graphs o Future Generation Computer Systems, Volume 49, August 2015, Pages 8-27 Roxana Danger, Vasa Curcin, Paolo Missier, Jeremy Bryans
  21. 21. Basic idea • The aim of an access control strategy is not only to determine if the resource can be viewed or not, but to construct a view of the graph which satisfies the security constraints • The goal is for maximum amount of information to be retained • NB Based on TRANSFoRm use cases but not implemented in the live system
  22. 22. Access control • Ensuring that a principal (person, process, etc.) can only access the services or data in a system that they are authorized to • Implemented through security policies that try to enforce a certain protection goal such as to prevent unauthorized disclosure (secrecy) and intentional or accidental unauthorized changes (integrity) • Authorizations for some resource can be: o Positive (allow) o Negative (deny)
  23. 23. Access control • Two classical approaches: o Closed policy • deny-by-default • Access to a resource is only granted if a corresponding positive authorization policy exists o Open policy • Permit-by=default • Access unless a corresponding negative authorization policy exists. • Combined approach used to support policy exceptions • Conflict resolution needed if multiple policies apply, e.g. o denials-take-precedence o most-specific-takes- precedence o priority levels o time-dependent access.
  24. 24. Access control languages for provenance • Qin Ni et al o Semantic description of subjects (user roles) and resources to be accessed o conditions under which restrictions are applied, o four different types of access permissions. • Cadenhead et al o Added regular expressions for resource and condition descriptions • Transformation-oriented Access Control Language for Provenance (TACLP) o Allows users to define subgraphs to be transformed, with three different levels of abstractions (namely hide, minimal and maximal).
  25. 25. Indirect relations • Introduce some new relations to be used in abstraction
  26. 26. External effects and causes • External effects and causes of the set of nodes S w.r.t. a set of nodes R o Set of nodes that represent the immediate effects/causes of S that would be affected by removal of nodes in R from the graph V (𝑆 ⊆ 𝑅 ⊆ 𝑉) o If S=R, then denote as ef(R) and ca(R)
  27. 27. External effects and causes
  28. 28. Basic operations • Node removal o Subgraph needs to be hidden o e.g. if it is unnecessary for an analysis or user access to it has been restricted. • Node replacement o removing details of data and operations in a subgraph while retaining some information (abstract entity) of the existence of such subgraph.
  29. 29. Operation: node removal • Let Prov = (V , E , type) and R ⊆ V be a set of nodes to be removed. Result is a new provenance graph Prov′ =(V′,E′,type′), such that:
  30. 30. Operation: node replacement • As before, with operation AR replacing node set R with node va
  31. 31. Abstract nodes and edges • Dummy nodes introduced during entity replacement • Preserve the causality of the rest of the graph • Two types of dependencies: o Indirect • Denoted with double lines • Represent multi-step dependences (wdf+, u+, wgb+, wtb+) o Soft dependencies • Denoted with double dashed lines • Generic transitive relationship which is not one of the above
  32. 32. Removal and Replacement Replace (A,B) Remove (A,B)
  33. 33. Removal and Replacement Replace (A,B) Remove (A,B)
  34. 34. False dependencies • False dependencies introduce a previously non- existent path in the new graph, e.g. removing A, B
  35. 35. Causality preserving transformation • A transformation is called causality preserving if it does not introduce false dependencies. • Given a provenance graph and a set of entities to be abstracted/hidden, the question is how can these entities be joined or removed from the graph using only causality-preserving transformations?
  36. 36. Causality preserving partition and transformation • Given a set of nodes R ⊆ V, a causality preserving partition ℘ of R is such that removing or replacing any set of nodes 𝑃 ∈ ℘ will not introduce causal dependencies • A graph transformation by partition ℘ of R is then a sequential application of Remp or Repp • The necessary and sufficient condition for such transformation to be causality preserving is that for each 𝑃 ∈ ℘ all of P’s external causes and effects are connected
  37. 37. Optimal causality preserving partition • Default partition of R consists of singletons, i.e. each node in R is a set in the partition. • Optimal partition is such that none of its sets have the same sets of external causes and effects w.r.t. R • Partitioning algorithm o Step 1, determine external causes and effects for default partition o Step 2, gradually merge the partitions until optimal.
  38. 38. Provenance graph transformation algorithm • Once the partition is computed, the transformations are iteratively applied to each element in the partition • Labels input provides names for generated abstract nodes • Levels input provides abstraction level for each partition o Hide • remove operation o Minimum abstraction, maximum abstraction • replace operation • isolated singletons removed as a special case.
  39. 39. Computational efficiency • Transformation algorithm performance depends on the performance of the partition algorithm • The other steps are linear to cardinality of the set of partitions ℘ and its edges • The partition algorithm considers pair-wise combinations of nodes. • Overall complexity is O(R2), where R is the set of nodes to abstract
  40. 40. Experimental results • Provenance view transformation algorithm was implemented in Python 2.7 using Networkx API. • Experiments were executed on Ubuntu 12.04, Intel Core i7-3687U CPU with 2.10GHz and 8GB RAM • Synthetic provenance graphs used, randomly generating edges for each node within the degree range 2-10 • Two parameters: o the percentage of nodes to abstract (from 5 to 25 with a step 5) o the percentage of nodes to abstract which are causally dependent (from 0 to 100 with a step of 25) • Each configuration was executed 10 times and the plots presented show the averages of these executions.
  41. 41. Performance behaviour • Execution time (Y) in seconds as a function of the number of nodes (X) and the percentage of nodes to abstract (Z) • Quadratic time
  42. 42. Use case: Access to health data • Access control for the provenance data collected from an Electronic Health Record (EHR) and clinical trial systems • Rules: o Auditors. Healthcare system auditors or law enforcement agencies can access the whole provenance graph during the auditing process. o Family doctors and patients. Electronic health records and their data provenance can only be accessed by patients during weekends, and by FDs during weekdays. o Active FDs. Active FDs have access to the provenance data associated with the EHRs of their patients and its provenance; o Clinical trial 1. If some data comes from a clinical trial, the GP needs to be participant of the trial to see the subgraph associated with that trial. o Clinical trial 2. Patients do not have access to clinical trial processes. o Laboratory. Patients do not have access to laboratory processes. o Automatic diagnosis recommendation. Patients have no access to any information related to the automatic diagnosis recommendation nor to the graph segment connecting it with the clinical evidences.
  43. 43. TACLP • Transformation-oriented Access Control Language for Provenance (TACLP) • Extends the works of Ni and Cadenhead by introducing transformations • A policy consists of: o Target o Effect o Transformation o Condition (optional) o Obligation (optional)
  44. 44. TACLP Target • Subject element o Set of users (subject element) to which the policy should be applied, expressed through IRI references • Record element o Set of resources to which the policy should be applied, expressed through IRI references • Restriction element (optional) o A conditional expression under which the policy is applied o Either a relational comparison between a value in a property path and a literal, or a full logical expression. • Scope element (optional) o If the policy is ‘transferable’ or ‘non-transferable’ with respect to subjects o Whether it applies to all the ancestors of matched elements in the graph, or to the matched elements only.
  45. 45. TACLP Effect • Specifies the intended outcome • Four possibilities: o Absolute permit guarantees access to the graph regardless of the effect of other policies • e.g. for allowing access to auditors or law enforcement agencies, and avoids the need for additional conditions in deny policies o Deny guarantees that certain parts of the graph will not be accessed by users in the subject element. o Necessary permit is used to describe the necessary, but not always sufficient, conditions for accessing certain parts of the graphs o Permit is used to describe those parts of the graph that can be accessed if there are no other policies denying access to it.
  46. 46. TACLP Transformation • How to transform the provenance graph in order to hide certain resources • Specification of which nodes need to be hidden and Removal/Replace operations to be applied to them • Set of policies comprising o Policy type (target, record, condition, effect, transformation element and obligation) o Policy evaluation type (deny- takes-precedence or permit-takes-precedence)
  47. 47. TACLP Transformation • Abstraction level o Hide • matched nodes of the subgraph have to be completely hidden (removed) from the graph • Remove transformation is applied; o Minimum abstraction • Replace transformation is applied • No caused-by relationship (soft dependencies) will appear in the transformed graph. o Maximum abstraction • Replace transformation is applied • Soft dependencies can appear in the transformed graph.
  48. 48. Access control evaluation algorithms • Aim to produce an abstracted graph that satisfies the constraints • Deny-takes-precedence 1. Absolute permit policies evaluated first 2. Necessary permit and deny policies 3. Permit policies • Allow-takes-precedence 1. Absolute permit evaluated first 2. Necessary permit policies 3. Permit policies 4. Deny policies
  49. 49. Example: Source provenance graph
  50. 50. Example: Abstracted provenance graph
  51. 51. Summary • Learning Health System presenting new set of challenges for medical and informatics communities • Provenance can help establish trust in the LHS • Methods needed to verify trust • Abstraction of provenance traces needed to address requirements of multiple stakeholders o Researchers o Regulators o Publishers • Future work o Projects running on provenance of decision support and visual analytics for health data o Looking for partnerships to investigate applications of the security work
  52. 52. Acknowledgements • Thanks to: o Roxana Danger o Paolo Missier o Jeremy Bryant o Derek Corrigan o Brendan Delaney
  53. 53. Questions?
  54. 54. Thank you!

×