Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Towards Generating Policy-compliant Datasets

"Towards Generating Policy-compliant Datasets" by Christophe Debruyne, Harshvardhan J. Pandit, Dave Lewis, Declan O’Sullivan. Presented at the The 13th IEEE International Conference on SEMANTIC COMPUTING
Jan 30 - Feb 1, 2019, Newport Beach, California

  • Be the first to comment

  • Be the first to like this

Towards Generating Policy-compliant Datasets

  1. 1. Towards Generating Policy-compliant Datasets Christophe Debruyne, Harshvardhan J. Pandit, Dave Lewis, Declan O’Sullivan Trinity College Dublin – {first.last} 2019-01-31 @ ICSC The ADAPT Centre is funded under the SFI Research Centres Programme (Grant 13/RC/2106) and is co-funded under the European Regional Development Fund.
  2. 2. www.adaptcentre.ieMotivation • Datasets are created and used for a specific purpose, but such data processing is the subject of various internal and external regulations – e.g., GDPR. People must give their informed consent for that purpose. • Research question: Can we generate datasets for a specific purpose “just in time” that complies with informed consent? • Unlike other initiatives (which we will discuss later), we focus on consent information that has been stored, not the context (e.g., via a form) nor the process. 1/31/19 2
  3. 3. www.adaptcentre.ieMethod • R2DQB*, pronounced R-2-D-cube, proposes a method to annotate RDF Data Cube data structure definitions to generate R2RML mappings that will create a RDF Data Cube dataset. • How can we extend or adopt this method for generating compliant datasets? 1/31/19 3 *Christophe Debruyne, Dave Lewis, Declan O'Sullivan: Generating Executable Mappings from RDF Data Cube Data Structure Definitions. OTM Conferences (2) 2018: 333-350 • Data Structure Definitions • Dimensions • Measures • Attributes • References to tables • References to columns • Transformation functions • … Mapping Engine R2RML Mapping R2RML Processor Data Cube Dataset extended with according to 1 2 3 Validation 4 Provenance Information captured with 5
  4. 4. www.adaptcentre.ieMethod – The Model • Modelled using Object Role Modelling (ORM) and translated into OWL • Policy, Purpose, and Inclusion • Consenting Party, Consent, Date and Times • External uniqueness constraints indicate how inclusion and consent are identified • Integration of PROV-O allow for revisions of inclusions, consent, and policy • Consenting Party ≡ prov:Person ⊔ prov:Organization 1/31/19 4
  5. 5. www.adaptcentre.ieMethod – Using the ontology DESCRIBE ?consent WHERE { ?consent ont:forInclusion ?inclusion . { # GET THE LATEST INCLUSION OF PURPOSE FOR A POLICY SELECT ?inclusion WHERE { ?inclusion ont:ofPurpose <.../purpose> . ?inclusion ont:ofPolicy <.../policy> . <.../policy> dcterms:created ?dt . } ORDER BY DESC(?dt) LIMIT 1 } ?consent ont:givenBy ?user . ?consent ont:registeredOn ?datetime . # GET THE LATEST CONSENT INFORMATION FOR EACH USER FILTER NOT EXISTS { [ ont:forInclusion ?inclusion ; ont:givenBy ?user ; ont:registeredOn ?datetime2 ] FILTER(?datetime2 > ?datetime) } } 1/31/19 5 Retrieving the latest consent information for a specific purpose of the latest version of a policy. DESCRIBE query returns a graph which we will use to manipulate the dataset.
  6. 6. www.adaptcentre.ieExtending R2DQB 1. # PREFIXES OMMITED FOR BREVITY 2. @base <> 3. dct:identifier a rdf:Property, qb:DimensionProperty ; 4. rr:template "{id}" ; 5. rdfs:label "user id"@en ; 6. rdfs:subPropertyOf sdmx-dimension:refPeriod ; 7. rdfs:range owl:Thing . 8. foaf:mbox a rdf:Property, qb:MeasureProperty ; 9. rr:template "mailto:{email}" ; 10. rdfs:label "email address"@en ; 11. rdfs:subPropertyOf sdmx-measure:obsValue ; 12. rdfs:range owl:Thing . 13. <#dsd-le> a qb:DataStructureDefinition; 14. rr:tableName "user"; 15. ont:forPurpose <> ; 16. ont:forPolicy <> ; 17. qb:component [ qb:dimension dct:identifier ]; 18. qb:component [ qb:measure foaf:mbox ] . 1/31/19 6 A simple data structure definition (DSD). Yellow à Annotations for the generation of an R2RML mapping Cyan à Linking the DSD to a purpose of a policy
  7. 7. www.adaptcentre.ieExtending R2DQB • The annotated Data Structure Definition is given to the R2DQB engine to generate an R2RML mapping • The R2RML mapping is executed (*) resulting in a graph G1 • We execute the DESCRIBE query introduced earlier, resulting in a graph G2. This graph is used to create a list of consent instances (URIs) where the property isGiven is true. • We then use that list to apply the following query to G1 to create a graph G1’ only retaining the information of people who have given their consent DESCRIBE ?obs ?dataset WHERE { ?obs a qb:Observation . ?obs qb:dataSet ?dataset . ?obs dct:identifier ?dim . VALUES ?dim { <uri1> … <urin> } } 1/31/19 7 We use the R2RML-F (in case the underlying DB technology does not support certain functions): Christophe Debruyne, Declan O'Sullivan: R2RML-F: Towards Sharing and Executing Domain Logic in R2RML Mappings. LDOW@WWW 2016
  8. 8. www.adaptcentre.ieImplementation details • Both consent information and endpoint are behind the service, one just needs an annotated DSD. Governance platforms can be adopted to guide one to identifiers for a policy and purpose. • Intermediate graphs that are generated allow for a posteriori analysis and, • Both governance problems are outside the scope of this study and considered future work. 1/31/19 8
  9. 9. www.adaptcentre.ieDemonstration • Due to a lack of data, our method was demonstrated using a synthetic dataset. This dataset was created by simulating people giving and withdrawing their consent. Consent can also expire. • Paper provides details about the dataset (*). • We believe we demonstrated the viability of our approach, though more experiments are called for. 1/31/19 9 (*)
  10. 10. www.adaptcentre.ieRelated Work • Both SPIRIT (Westphal et al. 2018) and SPECIAL (Kirrane et al. 2018) studies have similar concepts (data subject, purpose, …), but aim to analyze compliance a posteriori (both) or a priori (SPECIAL). Our goal was to generate compliant datasets “just in time” • Pandit, O’Sullivan and Lewis (2018) proposed an ontology for the operational representation of informed consent and allows one to analyze these representations w.r.t. annotated logs and questionnaires. • Fatema et al (2017) proposed a model for representing the informed consent an organization has obtained, but there is no explicit notion of policies or support for revisions. • Given the overlap in terminology, it is clear there is an opportunity in aligning the vocabularies. 1/31/19 10
  11. 11. www.adaptcentre.ieSummary and Conclusion • We argued that datasets are used for a purpose and that datasets should be built suitable for a purpose, including any policies it should comply with. Can we generate datasets for a specific purpose “just in time” that complies with informed consent? The answer is yes. How? 1. By adopting a method for generating R2RML mappings from annotated Data Structure Definitions that in turn generate datasets 2. By creating an ontology for informed consent (and revisions thereof) and link aforementioned DSDs to purposes in policies 3. We use a knowledge base using that ontology to manipulate the generated dataset – from (1) – retaining the data that is compliant 4. All intermediate graphs allow one to trace the various steps 1/31/19 11
  12. 12. www.adaptcentre.ieFuture work A current limitation is a lack of evaluation beyond the synthetic dataset created for the study. We furthermore recognize the opportunities in aligning or integrating our models and approach with related work (ongoing). See for more of our projects. 1/31/19 12

    Be the first to comment

    Login to see the comments

"Towards Generating Policy-compliant Datasets" by Christophe Debruyne, Harshvardhan J. Pandit, Dave Lewis, Declan O’Sullivan. Presented at the The 13th IEEE International Conference on SEMANTIC COMPUTING Jan 30 - Feb 1, 2019, Newport Beach, California


Total views


On Slideshare


From embeds


Number of embeds