Provenance abstraction for
implementing security policies
Learning Health System and securing provenance of
health data
Dr...
Overview
• Learning Health System
• LHS requirements for provenance data
• TRANSFoRm project
• Transformation-oriented Acc...
Learning Health System
“ ... one in which progress in science,
informatics, and care culture align to generate
new knowled...
2!
2
A Learning Health System
for the Nation
Pharmaceutical Firm
Beacon
Community
Integrated
Delivery
System
Community
Pra...
Learning Health System take-up
• US medical/academic centres
o Mayo, Duke, Vanderbilt
o PCORI
• National data aggregators
...
Example: Clinical trial challenges
• Major motivation for the LHS work
• Trials too expensive and difficult to run
• Effic...
LHS for Clinical Trials
• EHR integration
o Eligibility checking done automatically from EHR data
o eCRFs partially filled...
Trust in the LHS
• Research community is struggling to ensure transparency
and correctness of published research
• Reasons...
Trust in the LHS (cont.)
• The problem is by no means restricted to preclinical studies
• Twelve randomised clinical trial...
• Each component in the
healthcare system
produces and consumes
data:
• Epidemiological research
using record linkages
• R...
TRANSFoRm project
• €7.5M European Commission 2010-2015
• Funded under the Patient Safety Work Program of FP7
• Developing...
Middleware
Secure data
transport
RCT tools
(Electronic Data
Collection)
Epidemiological
study tools
(Data queries)
Authent...
Use case 1: Type 2 Diabetes
• Research Question: In type 2 diabetic
patients, are selected single nucleotide
polymorphisms...
Use case 2: Gastro-oesophageal reflux disease (GORD)
• Research Question: What gives the best symptom relief
and improveme...
Use case 3: Diagnostic Decision Support
• Early diagnostic suggestions for presenting problems:
• chest pain
• abdominal p...
Provenance challenge for TRANSFoRm
• Viable methods for adoption in a heterogeneous
software environment
o No shared workf...
Semantic annotations
• Semantic concepts in the provenance graph defined using
TRANSFoRm ontologies:
o Clinical Research I...
Provenance templates
Provenance database
Provenance server
Existing
tools
1. Tools are agnostic to provenance
representati...
Example: Provenance of diagnostic recommendation
Provenance security
• Use a single provenance graph for:
o Full trial audit
o Reporting studies
o Publication review
o Col...
Basic idea
• The aim of an access control strategy is not only to
determine if the resource can be viewed or not, but
to c...
Access control
• Ensuring that a principal (person, process, etc.) can
only access the services or data in a system that
t...
Access control
• Two classical approaches:
o Closed policy
• deny-by-default
• Access to a resource is only granted if a c...
Access control languages for provenance
• Qin Ni et al
o Semantic description of subjects (user roles) and resources
to be...
Indirect relations
• Introduce some new relations to be used in
abstraction
External effects and causes
• External effects and causes of the set of nodes S
w.r.t. a set of nodes R
o Set of nodes tha...
External effects and causes
Basic operations
• Node removal
o Subgraph needs to be hidden
o e.g. if it is unnecessary for an analysis or user access t...
Operation: node removal
• Let Prov = (V , E , type) and R ⊆ V be a set of nodes to be
removed. Result is a new provenance ...
Operation: node replacement
• As before, with operation AR replacing node set R
with node va
Abstract nodes and edges
• Dummy nodes introduced during entity
replacement
• Preserve the causality of the rest of the gr...
Removal and Replacement
Replace (A,B)
Remove (A,B)
Removal and Replacement
Replace (A,B)
Remove (A,B)
False dependencies
• False dependencies introduce a previously non-
existent path in the new graph, e.g. removing A, B
Causality preserving transformation
• A transformation is called causality preserving if it
does not introduce false depen...
Causality preserving partition and transformation
• Given a set of nodes R ⊆ V, a causality preserving
partition ℘ of R is...
Optimal causality preserving partition
• Default partition of R consists of singletons, i.e.
each node in R is a set in th...
Provenance graph transformation algorithm
• Once the partition is computed, the
transformations are iteratively applied to...
Computational efficiency
• Transformation algorithm performance depends on
the performance of the partition algorithm
• Th...
Experimental results
• Provenance view transformation algorithm was
implemented in Python 2.7 using Networkx API.
• Experi...
Performance behaviour
• Execution time (Y) in seconds as a function of the
number of nodes (X) and the percentage of nodes...
Use case: Access to health data
• Access control for the provenance data collected from an
Electronic Health Record (EHR) ...
TACLP
• Transformation-oriented Access Control Language
for Provenance (TACLP)
• Extends the works of Ni and Cadenhead by
...
TACLP Target
• Subject element
o Set of users (subject element) to which the policy should be
applied, expressed through I...
TACLP Effect
• Specifies the intended outcome
• Four possibilities:
o Absolute permit guarantees access to the graph regar...
TACLP Transformation
• How to transform the provenance graph in order to
hide certain resources
• Specification of which n...
TACLP Transformation
• Abstraction level
o Hide
• matched nodes of the subgraph have to be completely hidden
(removed) fro...
Access control evaluation algorithms
• Aim to produce an abstracted graph that satisfies
the constraints
• Deny-takes-prec...
Example: Source provenance graph
Example: Abstracted provenance graph
Summary
• Learning Health System presenting new set of
challenges for medical and informatics communities
• Provenance can...
Acknowledgements
• Thanks to:
o Roxana Danger
o Paolo Missier
o Jeremy Bryant
o Derek Corrigan
o Brendan Delaney
Questions?
Thank you!
Upcoming SlideShare
Loading in …5
×

Provenance abstraction for implementing security: Learning Health System and securing provenance of health data

141 views

Published on

Discussion of provenance usage in the Learning Health System paradigm, as implemented in the TRANSFoRm project, with focus on security requirements and how they can be addressed using provenance graph abstraction.

Published in: Software
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
141
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • The US health system is going digital ~30% now , ~80% by 2019
    - In many EU countries primary care, 100% usage of EHRs, more than 50% completely paperless
    • If each care provider, patient, researcher, used his/her own data only for immediate needs, we are failing to realize the potential
    • If comparable data are shared, we can learn and improve
    • The key is to figure out how to do this routinely.
    We can’t afford to waste data.

  • The overall goal is a healthcare system that draws on the best evidence to provide the most appropriate care for each patient, focusing on prevention and health promotion, delivers the most value, and adds to learning and improvements with each care experience

    LHS:
    “ ... one in which progress in science, informatics, and care culture align to generate new knowledge as an ongoing, natural by-product of the care experience, and seamlessly refine and deliver best practices for continuous improvement in health and health care.” (Institute of Medicine)


    Examples:
    1. Nationwide post-market surveillance of a new drug quickly reveals that personalized dosage algorithms require modification. A modified decision support rule is created and is implemented in EHR systems.
    2. During an epidemic, new cases reported directly from EHRs. As the disease spreads into new areas, clinicians are alerted.
    3. A patient faces a difficult medical decision. She bases that decision on the experiences of other patients like her.

    Key is to move beyond individual knowledge silos – there are some wonderful solutions out there, particularly in the US, which do brilliant work locally, but do not consider the interoperability with the wider world. Researchers are increasingly asking: how portable is this, and how can we pick it up?
  • Feedback loop
  • Part of a wider reproducibility challenge

    Potential reasons:
    incorrect or inappropriate stat analysis of results or insufficient sample sizes
    pressure to publish sometimes results in negligence over the control or reporting of experimental conditions
    bias towards publishing positive results
    many initially rejected papers get published in other journals without substantial changes or improvements

    Important not to overreact: Being unable to reproduce the findings does not automatically mean that the study is flawed, however it does open the research to questions.
     
    Number of citations for the unreproducible findings actually outpaced those with reproducible findings! (ibid)
  • Front end tools sharing the same set of reusable components in middleware and data connectivity package
  • We start from domain ontologies, and map them to provenance ontologies, using OPM concepts

    Our project predates the current W3C standard but mapping from OPM to PROV straightforward (the other way, not necessarily so)

    Challenge was that our tools are heterogeneous, some are user-facing, and don’t share an execution environment.

    Thus we introduced provenance templates.
  • Provenance server exports service interface based on the templates
    Abstract provenance graph fragments with semantic annotations
    Client applications provide details (investigators, data set references, study parameters)
    Sent to the provenance interface and converted into full provenance graphs and stored in the database

    This is a very non-intrusive way of embedding provenance into a software ecosystem.
  • Our work builds upon existing work in the field.
  • Key are generic causality relations and indirect relations
    Indirect essentially composes the original relation with transitive closure of was derived from
  • R is the set that is getting removed, we are observing the effects and causes from its subset S
  • Highlight the difference when R=S
  • Entity removal transformation (RemR) is used when a subgraph needs to be hidden, e.g. if it is unnecessary for an analysis or user access to it has been restricted.
  • Entity replacement (RepR) is used for removing details of data and operations in a subgraph while retaining some information (abstract entity) of the existence of such subgraph.
  • Removal and replacement transformations do not introduce cycles in the new graph as long as the original graph is acyclic, as OPM provenance graphs are. However, using these transformations on an arbitrary set of nodes can introduce false dependencies, that is, causal links that were not present in the original graph.
  • Soft edge introduced in remove
  • Removal and replacement transformations do not introduce cycles in the new graph
    However, using these transformations on an arbitrary set of nodes can introduce false dependencies, that is, causal links that were not present in the original graph.
    entity replacement transformation introduces false dependencies when entities A and B are joined.

    In this case, paths from 2 to 5, D, and E do not exist in the original graph.
  • Proof in the paper
  • Not in the live system
  • Remove this?
  • Use of c* - the most general connectivity in the provenance graph
  • The graph shows the evolution of an EHR of a patient during two visits and the subsequent actions. In the first visit the patient (Ag1) visited a general practitioner (Ag3) and an EHR system (Ag2) was used to record all the details of the visit. First, new item creation process (P1) executed, generating a new EHR version (EHR v20 - A2) based on the previous version of the patient’s EHR (EHR v19 - A1). After the patient detailed the symptoms, the GP gave them a prescription (A3) to be followed, created a blood test form (A4) for the test to be performed, and updated the data in the EHR system, generating a new version of the record (EHR v21 - A5). The blood test form was used to prepare the instrumentation and conduct the measurement (P3). All these operations were controlled by the laboratory System (Ag4) and a laboratory technician (Ag5). As part of this process, a laboratory condition report was generated (A6), and it triggered the blood test report creation process (P5), which generated the test report (A7) and a new version of the EHR containing the results of the test (EHR v22 - A9). The test and the laboratory condition reports (A7 and A6) were both used during the creation of an electronic Case Report Form, eCRF, (P4), as the patient is involved in a clinical Trial, and his progress is also followed by the Clinical Trial researcher (Ag6). The result of this action is the eCRF (A8).

    In the second visit, a new EHR item process (P6) was executed again, producing the new version of the EHR (A10). Followed this, the doctor used a decision support system (Ag10) to confirm their diagnostic hypothesis. They opened the application, entered the patient details (P7), and a set of diagnostic cues (A11) that were extracted from the EHR of the patient. These were then compared (P8) with the clinical evidence repository (A12) of the decision support system. A diagnostic recommendation (A13) was then obtained and given as a possible option to the GP, who used it to generate their final diagnosis (A15). A variable containing the recommendation chosen by the GP (A14) is also generated and maintained by the decision support tool. Once the GP had the diagnosis, they proceed to update the data in the EHR system, generating a new prescription for the patient (A16), and a new version of the EHR (A17).




  • Notice that the labels properly describe the aim of the abstracted entities in the cases of laboratory and clinical trial, and the whole subgraph corresponding to the automatic diagnosis decision support processing is removed.

  • Ultimately, LHS is about scaling up of the health system, and consequently the associated research that health system is built upon. If this scaling is to succeed we have to install mechanisms to verify trust in the system inside our research instruments. In the research world increasingly reliant on electronic tools, provenance gives us a lingua franca to achieve traceability, which we have shown to be essential to building these mechanisms. The idea was evaluated in a provenance infrastructure that was implemented in the TRANSFoRm project in three distinct LHS domains, those of clinical trials, decision support systems and cohort studies. The challenge now is to address the provenance gap that exists between the provenance metadata collected and the reporting requirements of different domains, and this will require a joint effort by a range of stakeholders, including medical scientists, informaticians, publishers and regulators. However, this work is essential if the quality of translation from research into practice in the LHS is to improve with the growing volume of data and research and not deteriorate.
  • Provenance abstraction for implementing security: Learning Health System and securing provenance of health data

    1. 1. Provenance abstraction for implementing security policies Learning Health System and securing provenance of health data Dr Vasa Curcin King’s College London
    2. 2. Overview • Learning Health System • LHS requirements for provenance data • TRANSFoRm project • Transformation-oriented Access Control Language for Provenance (TACLP)
    3. 3. Learning Health System “ ... one in which progress in science, informatics, and care culture align to generate new knowledge as an ongoing, natural by- product of the care experience, and seamlessly refine and deliver best practices for continuous improvement in health and health care.” (Institute of Medicine) We can’t afford to waste data!
    4. 4. 2! 2 A Learning Health System for the Nation Pharmaceutical Firm Beacon Community Integrated Delivery System Community Practice Health'Informa. on'Organiza. on' Health Center Network Federal Agencies State Public Health Governance Patient Engagement Trust Analysis Dissemination Learning Health System Defining functions of a LHS are to: 1.routinely and securely aggregate data from disparate sources 2.convert the data to knowledge 3.disseminate that knowledge, in actionable forms, to everyone who can benefit from it. c/o C. Friedman
    5. 5. Learning Health System take-up • US medical/academic centres o Mayo, Duke, Vanderbilt o PCORI • National data aggregators o Clinical Practice Research Datalink o NIVEL • EHR vendors o CSC, Asseco, TPP, InPractice Systems • European academic-industrial collaborations o TRANSFoRm, EHR4CR, Semantic HealthNet …and Bill
    6. 6. Example: Clinical trial challenges • Major motivation for the LHS work • Trials too expensive and difficult to run • Efficacy-effectiveness gap (EEG) o Disconnect between outcomes from clinical trials and information needed for clinical practice o Interaction of drug effect and real-life contextual factors o Challenge to identify contextual factors • LHS provides context and workflow
    7. 7. LHS for Clinical Trials • EHR integration o Eligibility checking done automatically from EHR data o eCRFs partially filled based on EHR information o All collected data stored in the EHR system as well as the research database • Closing the loop o eCRF data enriches the EHR o Helps the clinician o Adds value to the EHR system • Data does not go to waste! 7
    8. 8. Trust in the LHS • Research community is struggling to ensure transparency and correctness of published research • Reasons complex and interleaving (positive bias, intractable analysis, deluge of journals) • Bayer Healthcare team published work showing that only 25% of the academic studies they examined could be replicated o Prinz et al. Nat. Rev. Drug Discov. 10, 712, 2011 • Of 53 oncology studies from 2001-2011, each highlighting big new apparent advances in the field, only 11% (6) could be robustly replicated. o Begley & Ellis Nature 483, 531–533, 2012
    9. 9. Trust in the LHS (cont.) • The problem is by no means restricted to preclinical studies • Twelve randomised clinical trials testing 52 observational claims and failed to reproduce a single one o Young SS, Karr A. Deming, data and observational studies. Significance sep 2011; 8(3):116–120 • Replication of 100 experiments published in 2008 in three high- ranking psychology journals – less than one half of finding replicated o Estimating the reproducibility of psychological science. Science Aug 2015;349(6251) • Random sample of 441 biomedical journal articles 2000 – 2014: none made all their data available, one provided full protocol, majority did not disclose funding or conflicts of interest o Iqbal et al. Reproducible Research Practices and Transparency across the Biomedical Literature. PLoS biology 2016; 14(1) • Cost of irreproducible research in life science is estimated at $28 billion per year in the U.S o Freedman LP, Cockburn IM, Simcoe TS. The Economics of Reproducibility in Preclinical Research. PLOS Biology jun 2015; 13(6)
    10. 10. • Each component in the healthcare system produces and consumes data: • Epidemiological research using record linkages • Research data embedded in the EHR • Decision support for diagnosis • Provenance infrastructure required to support all these domains Data in the Learning Health System Specific research data Actionable data Routinely collected data • Clinical trials • Controlled populations • Well-defined questions • EHR systems • Wide coverage • Vast quantity • May lack in detail and quality • Distilled scientific findings • Usable in clinical practice • Decision support
    11. 11. TRANSFoRm project • €7.5M European Commission 2010-2015 • Funded under the Patient Safety Work Program of FP7 • Developing methods, models, services, validated architectures and demonstrations to support: o Epidemiological research using GP records, including genotype- phenotype studies and other record linkages o Clinical trials embedded in the EHR o Decision support for diagnosis www.transformproject.eu
    12. 12. Middleware Secure data transport RCT tools (Electronic Data Collection) Epidemiological study tools (Data queries) Authentication framework Diagnostic support tools Data source connectivity module Provenance framework Vocabulary service TRANSFoRm software landscape
    13. 13. Use case 1: Type 2 Diabetes • Research Question: In type 2 diabetic patients, are selected single nucleotide polymorphisms (SNPs) associated with variations in drug response to oral antidiabetic drugs (Sulfonylurea)? • Design: Case-control study • Data: primary care databases (phenotype data) pre-linked to genomic databases (genetic risk factors) – data federation
    14. 14. Use case 2: Gastro-oesophageal reflux disease (GORD) • Research Question: What gives the best symptom relief and improvement in Quality of Life: continuous or on demand Proton Pump Inhibitor use? • Design: Randomised Controlled Trial (RCT) • Data: Collection through EHR & web based questionnaire – electronic case report forms AND mobile Patient Related Outcome Measures • Provenance and security
    15. 15. Use case 3: Diagnostic Decision Support • Early diagnostic suggestions for presenting problems: • chest pain • abdominal pain • shortness of breath • Clinical Prediction Rule web service (with underlying ontology) • Prototype Decision Support System integrated with a commercial electronic health record system • Vision by InPractice Systems
    16. 16. Provenance challenge for TRANSFoRm • Viable methods for adoption in a heterogeneous software environment o No shared workflow middleware to rely on • Need to achieve domain specificity • Able to demonstrate conformance to standards o Title 21 of the Code of Federal Regulations; Electronic Records; Electronic Signatures (21 CFR Part 11) o Good Clinical Practice (GCP) o EudraLex Vol. 4 Annex 11: Computerised Systems in EU o CONSORT, STROBE, RECORD
    17. 17. Semantic annotations • Semantic concepts in the provenance graph defined using TRANSFoRm ontologies: o Clinical Research Information Model (CRIM) o Software infrastructure ontology o Clinical evidence ontology • Ontology concepts annotations on provenance nodes • Provenance templates define domain actions that map to provenance fragments PCROM (UML Model) Randomised Clinical Trial Ontology (RCTO) Randomised Clinical Trial Provenance Ontology (RCTPO)
    18. 18. Provenance templates Provenance database Provenance server Existing tools 1. Tools are agnostic to provenance representation 2. Service invocation matches some provenance template in Provenance server 3. Template is instantiated into a provenance graph fragment with OWL concept annotations 4. Graphs merged inside the database API service calls OPM graphs annotated with OWL
    19. 19. Example: Provenance of diagnostic recommendation
    20. 20. Provenance security • Use a single provenance graph for: o Full trial audit o Reporting studies o Publication review o Collaborators o Readers • Need to abstract parts of the graph • Access control and view generation for provenance graphs o Future Generation Computer Systems, Volume 49, August 2015, Pages 8-27 Roxana Danger, Vasa Curcin, Paolo Missier, Jeremy Bryans
    21. 21. Basic idea • The aim of an access control strategy is not only to determine if the resource can be viewed or not, but to construct a view of the graph which satisfies the security constraints • The goal is for maximum amount of information to be retained • NB Based on TRANSFoRm use cases but not implemented in the live system
    22. 22. Access control • Ensuring that a principal (person, process, etc.) can only access the services or data in a system that they are authorized to • Implemented through security policies that try to enforce a certain protection goal such as to prevent unauthorized disclosure (secrecy) and intentional or accidental unauthorized changes (integrity) • Authorizations for some resource can be: o Positive (allow) o Negative (deny)
    23. 23. Access control • Two classical approaches: o Closed policy • deny-by-default • Access to a resource is only granted if a corresponding positive authorization policy exists o Open policy • Permit-by=default • Access unless a corresponding negative authorization policy exists. • Combined approach used to support policy exceptions • Conflict resolution needed if multiple policies apply, e.g. o denials-take-precedence o most-specific-takes- precedence o priority levels o time-dependent access.
    24. 24. Access control languages for provenance • Qin Ni et al o Semantic description of subjects (user roles) and resources to be accessed o conditions under which restrictions are applied, o four different types of access permissions. • Cadenhead et al o Added regular expressions for resource and condition descriptions • Transformation-oriented Access Control Language for Provenance (TACLP) o Allows users to define subgraphs to be transformed, with three different levels of abstractions (namely hide, minimal and maximal).
    25. 25. Indirect relations • Introduce some new relations to be used in abstraction
    26. 26. External effects and causes • External effects and causes of the set of nodes S w.r.t. a set of nodes R o Set of nodes that represent the immediate effects/causes of S that would be affected by removal of nodes in R from the graph V (𝑆 ⊆ 𝑅 ⊆ 𝑉) o If S=R, then denote as ef(R) and ca(R)
    27. 27. External effects and causes
    28. 28. Basic operations • Node removal o Subgraph needs to be hidden o e.g. if it is unnecessary for an analysis or user access to it has been restricted. • Node replacement o removing details of data and operations in a subgraph while retaining some information (abstract entity) of the existence of such subgraph.
    29. 29. Operation: node removal • Let Prov = (V , E , type) and R ⊆ V be a set of nodes to be removed. Result is a new provenance graph Prov′ =(V′,E′,type′), such that:
    30. 30. Operation: node replacement • As before, with operation AR replacing node set R with node va
    31. 31. Abstract nodes and edges • Dummy nodes introduced during entity replacement • Preserve the causality of the rest of the graph • Two types of dependencies: o Indirect • Denoted with double lines • Represent multi-step dependences (wdf+, u+, wgb+, wtb+) o Soft dependencies • Denoted with double dashed lines • Generic transitive relationship which is not one of the above
    32. 32. Removal and Replacement Replace (A,B) Remove (A,B)
    33. 33. Removal and Replacement Replace (A,B) Remove (A,B)
    34. 34. False dependencies • False dependencies introduce a previously non- existent path in the new graph, e.g. removing A, B
    35. 35. Causality preserving transformation • A transformation is called causality preserving if it does not introduce false dependencies. • Given a provenance graph and a set of entities to be abstracted/hidden, the question is how can these entities be joined or removed from the graph using only causality-preserving transformations?
    36. 36. Causality preserving partition and transformation • Given a set of nodes R ⊆ V, a causality preserving partition ℘ of R is such that removing or replacing any set of nodes 𝑃 ∈ ℘ will not introduce causal dependencies • A graph transformation by partition ℘ of R is then a sequential application of Remp or Repp • The necessary and sufficient condition for such transformation to be causality preserving is that for each 𝑃 ∈ ℘ all of P’s external causes and effects are connected
    37. 37. Optimal causality preserving partition • Default partition of R consists of singletons, i.e. each node in R is a set in the partition. • Optimal partition is such that none of its sets have the same sets of external causes and effects w.r.t. R • Partitioning algorithm o Step 1, determine external causes and effects for default partition o Step 2, gradually merge the partitions until optimal.
    38. 38. Provenance graph transformation algorithm • Once the partition is computed, the transformations are iteratively applied to each element in the partition • Labels input provides names for generated abstract nodes • Levels input provides abstraction level for each partition o Hide • remove operation o Minimum abstraction, maximum abstraction • replace operation • isolated singletons removed as a special case.
    39. 39. Computational efficiency • Transformation algorithm performance depends on the performance of the partition algorithm • The other steps are linear to cardinality of the set of partitions ℘ and its edges • The partition algorithm considers pair-wise combinations of nodes. • Overall complexity is O(R2), where R is the set of nodes to abstract
    40. 40. Experimental results • Provenance view transformation algorithm was implemented in Python 2.7 using Networkx API. • Experiments were executed on Ubuntu 12.04, Intel Core i7-3687U CPU with 2.10GHz and 8GB RAM • Synthetic provenance graphs used, randomly generating edges for each node within the degree range 2-10 • Two parameters: o the percentage of nodes to abstract (from 5 to 25 with a step 5) o the percentage of nodes to abstract which are causally dependent (from 0 to 100 with a step of 25) • Each configuration was executed 10 times and the plots presented show the averages of these executions.
    41. 41. Performance behaviour • Execution time (Y) in seconds as a function of the number of nodes (X) and the percentage of nodes to abstract (Z) • Quadratic time
    42. 42. Use case: Access to health data • Access control for the provenance data collected from an Electronic Health Record (EHR) and clinical trial systems • Rules: o Auditors. Healthcare system auditors or law enforcement agencies can access the whole provenance graph during the auditing process. o Family doctors and patients. Electronic health records and their data provenance can only be accessed by patients during weekends, and by FDs during weekdays. o Active FDs. Active FDs have access to the provenance data associated with the EHRs of their patients and its provenance; o Clinical trial 1. If some data comes from a clinical trial, the GP needs to be participant of the trial to see the subgraph associated with that trial. o Clinical trial 2. Patients do not have access to clinical trial processes. o Laboratory. Patients do not have access to laboratory processes. o Automatic diagnosis recommendation. Patients have no access to any information related to the automatic diagnosis recommendation nor to the graph segment connecting it with the clinical evidences.
    43. 43. TACLP • Transformation-oriented Access Control Language for Provenance (TACLP) • Extends the works of Ni and Cadenhead by introducing transformations • A policy consists of: o Target o Effect o Transformation o Condition (optional) o Obligation (optional)
    44. 44. TACLP Target • Subject element o Set of users (subject element) to which the policy should be applied, expressed through IRI references • Record element o Set of resources to which the policy should be applied, expressed through IRI references • Restriction element (optional) o A conditional expression under which the policy is applied o Either a relational comparison between a value in a property path and a literal, or a full logical expression. • Scope element (optional) o If the policy is ‘transferable’ or ‘non-transferable’ with respect to subjects o Whether it applies to all the ancestors of matched elements in the graph, or to the matched elements only.
    45. 45. TACLP Effect • Specifies the intended outcome • Four possibilities: o Absolute permit guarantees access to the graph regardless of the effect of other policies • e.g. for allowing access to auditors or law enforcement agencies, and avoids the need for additional conditions in deny policies o Deny guarantees that certain parts of the graph will not be accessed by users in the subject element. o Necessary permit is used to describe the necessary, but not always sufficient, conditions for accessing certain parts of the graphs o Permit is used to describe those parts of the graph that can be accessed if there are no other policies denying access to it.
    46. 46. TACLP Transformation • How to transform the provenance graph in order to hide certain resources • Specification of which nodes need to be hidden and Removal/Replace operations to be applied to them • Set of policies comprising o Policy type (target, record, condition, effect, transformation element and obligation) o Policy evaluation type (deny- takes-precedence or permit-takes-precedence)
    47. 47. TACLP Transformation • Abstraction level o Hide • matched nodes of the subgraph have to be completely hidden (removed) from the graph • Remove transformation is applied; o Minimum abstraction • Replace transformation is applied • No caused-by relationship (soft dependencies) will appear in the transformed graph. o Maximum abstraction • Replace transformation is applied • Soft dependencies can appear in the transformed graph.
    48. 48. Access control evaluation algorithms • Aim to produce an abstracted graph that satisfies the constraints • Deny-takes-precedence 1. Absolute permit policies evaluated first 2. Necessary permit and deny policies 3. Permit policies • Allow-takes-precedence 1. Absolute permit evaluated first 2. Necessary permit policies 3. Permit policies 4. Deny policies
    49. 49. Example: Source provenance graph
    50. 50. Example: Abstracted provenance graph
    51. 51. Summary • Learning Health System presenting new set of challenges for medical and informatics communities • Provenance can help establish trust in the LHS • Methods needed to verify trust • Abstraction of provenance traces needed to address requirements of multiple stakeholders o Researchers o Regulators o Publishers • Future work o Projects running on provenance of decision support and visual analytics for health data o Looking for partnerships to investigate applications of the security work
    52. 52. Acknowledgements • Thanks to: o Roxana Danger o Paolo Missier o Jeremy Bryant o Derek Corrigan o Brendan Delaney
    53. 53. Questions?
    54. 54. Thank you!

    ×