SlideShare a Scribd company logo
wdaqua.eu
BOUNCER: A Privacy-aware Query Processing Over
Federations of RDF Datasets
Kemele M. Endris, Zuhair Almhithawi, Ioanna Lytra, Maria-Esther Vidal, Sören Auer
DEXA 2018 - September 3 - 6, 2018
Regensburg, Germany
Innovation Training
Network (ITN)
2
Data privacy
Individual’s rights of
preserving access control to
personal data, e.g., health
information, genomics, and
political and religious
preferences.
3
Data privacy
Individual’s rights of
preserving access control to
personal data, e.g., health
information, genomics, and
political and religious
preferences.
http://www.oneadvisory.london/gdpr-eu-general-data-protection-regulation/
Motivating Example
4
Patient Mutation GeneLiquid Biopsy
1
https://commons.wikimedia.org/wiki/File:Green_hospital_icon.svg
Clinical Records Genomic data
RDF Graphs
5
Mutation Gene
15604253 22980975
‘c.2369C>T’ ‘7:55181378’
‘p.T790M’
‘ENST00000
275493’
‘EGFR’
:located_in
:mentioned_in
:acc_num
:mentioned_in
:mutation_aa
:geneName:mutation_loci:mutation_cds
Public
Data!
RDF Graphs
6
Patient
Liquid Biopsy
:name :gender
:smoking :egfr_mutated
:address
:birthdate
:biopsy
:targetTotal
:mutation_aa
Controlled
Data!
Local Join
Permitted!
No
Operation
Permitted!
‘Einstein str. 1,
10100, Germany’
‘Alice Bob’ ‘Female’
‘03/03/1923’
0.63%
‘p.T790M’‘True’‘False’
Sensitive
Data
SPARQL Query
7
Patient Mutation GeneLiquid
Biopsy
1
https://commons.wikimedia.org/wiki/File:Green_hospital_icon.svg
Clinical Records Genomic data
SPARQL Query
PREFIX ex: <http://example.com/vocab/>
SELECT DISTINCT ?mutation ?loci ?pubmedid ?accNum
WHERE {
?lbiop ex::mutation_aa ?mutation .
?lbiop ex:targetTotal ?targetTotal .
?patient ex:biopsy ?lbiop .
?patient ex:smoking “False” .
?patient ex:egfr_mutated “True” .
?cmut ex:mutation_aa ?mutation .
?cmut ex:mentioned_in ?pubmedid .
?cmut ex:mutation_loci ?loci .
?cmut ex:located_in ?gene .
?gene ex:gene_name “EGFR” .
?gene ex:acc_num ?accNum . }
1
2
3
4
5
6
7
8
9
10
11
Genes associated with non-smoking lung cancer
patients whose liquid biopsy
has been studied for somatic mutations that
involve EGFR gene amplification:
● Mutation name
● Genomic coordinates of the mutations
● Pubmed ID
● Accession numbers
Different Execution Plans
8
PREFIX ex: <http://example.com/vocab/>
SELECT DISTINCT ?mutation ?loci ?pubmedid ?accNum
WHERE {
?lbiop ex::mutation_aa ?mutation .
?lbiop ex:targetTotal ?targetTotal .
?patient ex:biopsy ?lbiop .
?patient ex:smoking “False” .
?patient ex:egfr_mutated “True” .
?cmut ex:mutation_aa ?mutation .
?cmut ex:mentioned_in ?pubmedid .
?cmut ex:mutation_loci ?loci .
?cmut ex:located_in ?gene .
?gene ex:gene_name “EGFR” .
?gene ex:acc_num ?accNum . }
1
2
3
4
5
6
7
8
9
10
11
1
2 3
4
5 6
7
8 9
10
11
1
2 3
4
5 6
7
8 9
10
10
11
10
11
6
7
8 9
1
23 4
5
Running the Query over different Federated Query Engines
FedX ANAPSID MULDER
Dependent
JOIN
Independen
t JOIN
Invalid Executions of Query Plans
9
1
2 3
4
5 6
7
8 9
10
11
1
2 3
4
5 6
7
8 9
10
10
11
10
11
6
7
8 9
1
23 4
5
FedX ANAPSID MULDER
Violates AC policy of
hospital by extracting
data about patient and
biopsy!
Violates AC policy of
hospital by extracting
data about patient and
biopsy!
Violates AC policy of
hospital by extracting
data about patient!
Agenda
1. BOUNCER: A privacy-aware federated query
engine over SPARQL endpoints
2. Empirical Evaluation
3. Conclusions and Lessons Learned
10
11
BOUNCER
Privacy-Aware
Query Engine
BOUNCER: Architecture
BOUNCER describes data sources using
Privacy-aware RDF Molecule Templates
(PRDF-MTs)
12
BOUNCER: Architecture
BOUNCER describes data sources using
Privacy-aware RDF Molecule Templates
(PRDF-MTs)
13
Privacy-aware RDF Molecule Templates (PRDF-MTs)
• PRDF-MTs characterize RDF molecules that share same
characteristics, e.g., ex:geneName, ex:mentioned_in
• PRDF-MTs are defined in terms of:
• RDF Class
• Set of predicates
• Set of access control operations associated with the
predicates
• Intra and inter links between PRDF-MTs
• API interfaces to access the data, e.g., SPARQL endpoint
14
Privacy-aware RDF Molecule Templates (PRDF-MTs)
15
Mutation Gene
ex:mentioned_in
ex:mutation_cds ex:mutation_loci
ex:mutation_aa
ex:acc_num
ex:geneName
ex:located_in Projecting Values
is allowed!
(Values are Public)
Privacy-aware RDF Molecule Templates (PRDF-MTs)
16
Patient
Liquid Biopsy
ex:biopsy
ex:address
ex:name
ex:gender
ex:birthdate
ex:targetTotal
ex:mutation_aaex:egfr_mutatedex:smoking
JoinLocal
No Operation
Permitted!
Join operation
permitted
Locally!
Privacy-aware Operations
1. Join Local
○ Join operation allowed only at server (on premises of the publisher(owner))
2. Join at Federation Mediator
○ Join operation allowed both at the server and at the mediator, i.e., the
federation engine
○ Values can not be projected out of the mediator
3. Project(public)
○ The values of the property p can be projected from the dataset to the user.
○ Join operation is allowed both at the server and mediator
17
Access control policy of a dataset is composed of privacy-aware
operations that can be performed on properties in the dataset.
Privacy-aware Source Selection & Decomposition
1. BOUNCER creates a query decomposition with service graph patterns
(SGPs) of star-shaped subqueries built according to PRDF-MTs
○ Star-shaped subqueries (SSQs) set of triple patterns with the same subject
2. Respect the privacy and access control policy of selected sources
○ Minimize execution time and maximize answer completeness by selecting
only relevant sources that allow for at least one access operation
18
Star-shaped Subqueries (SSQs)
19
PREFIX ex: <http://example.com/vocab/>
SELECT DISTINCT ?mutation ?loci ?pubmedid ?accNum
WHERE {
?lbiop ex::mutation_aa ?mutation .
?lbiop ex:targetTotal ?targetTotal .
?patient ex:biopsy ?lbiop .
?patient ex:smoking “False” .
?patient ex:egfr_mutated “True” .
?cmut ex:mutation_aa ?mutation .
?cmut ex:mentioned_in ?pubmedid .
?cmut ex:mutation_loci ?loci .
?cmut ex:located_in ?gene .
?gene ex:gene_name “EGFR” .
?gene ex:acc_num ?accNum . }
1
2
3
4
5
6
7
8
9
10
11
SSQ1
SSQ2
SSQ3
SSQ4
3 4
5
1
2
9
11
ex:biopsy
ex:mutarion_aa
ex:located_in
6 7
8 10
BOUNCER: Privacy-aware Source Selection
20
3 4
5
1
2
ex:biopsy ex:mutarion_aa ex:located_in
9
6 7
8
11
10
Patient Liquid Biopsy
SSQs
PRDF-MT
Mutation Gene
Source
AC Operations
3
4
5
JoinLocal
JoinLocal
JoinLocal
1
2
JoinLocal
JoinLocal
10
11
Public
Public
6
7
8
9
Public
Public
Public
Public
BOUNCER: Privacy-aware Source Selection
21
SSQs
PRDF-MT
Source
AC Operations
3
4
5
ex:mutarion_aa
Patient & Liquid Biopsy Mutation & Gene
3
4
5
JoinLocal
JoinLocal
JoinLocal
1
2
JoinLocal
JoinLocal
10
11
Public
Public
6
7
8
9
Public
Public
Public
Public
9
6 7
8
1110
1
2
Joins can be
performed locally
BOUNCER: Privacy-aware Query Planning
● BOUNCER applies a greedy heuristic based approach to
generate a bushy plan
○ leaves correspond to Star Shaped Queries (SSQs)
● BOUNCER finds a valid plan that respects the
privacy-policy of the data sources.
○ Selects physical operators based on the access policy of
the data sources
22
23
SSQs
PRDF-MT
Source
AC Operations
Operator respects the privacy
policy of binding values for
ex:muation_aa
Patient & Liquid Biopsy
3
4
5
1
2
Mutation & Gene
9
6 7
8
1110
Dependent JOINBOUNCER:
Privacy-aware
Query Planning
Empirical Evaluation
24
Experimental Setup
Research Questions (RQs):
RQ1) Does privacy-aware enforcement employed during source selection,
query decomposition, and planning impact query execution time?
RQ2) Can privacy-aware policies be used to identify query plans that
enhance execution time and answer completeness?
25
Experimental Setup
Benchmark:
The Berlin SPARQL Benchmark (BSBM) dataset
■ 200M triples
■ 14 queries
Metrics
■ Execution time: elapsed time between the submission of a query to an
engine and the delivery of the answers (timeout: 300 sec)
■ Throughput: number of answers produced per second
26
Experiment I: Decomposition and Planning Time
● Goal: Assess impact of access control enforcement during source
selection, decomposition and planning on the overall query execution
time. (RQ1)
● All properties in the federation are public,
● MULDER and BOUNCER are compared on:
a. Decomposition and Planning time.
b. Query execution time
27
Experiment I: Decomposition and Planning Time
28
BOUNCER consumes
more time in query
decomposition and
planning
Experiment I: Decomposition and Planning Time
29
BOUNCER consumes
more time in query
decomposition and
planning
Enforcing privacy and
access control is
costly
Experiment I: Overall Execution Time
30
BOUNCER query
plans speed up
execution time
Effective and valid
plans can be
identified
Simple query
Experiment II
● Goal: Access impact of privacy-aware query plans
● Privacy policy:
a. Local Join: all properties of Person, Producer, Product, and ProductFeature
b. Project (Public): all properties of Offer, Review, ProductType, and Vendor
● Federated Query Engines
a. FedX
b. ANAPSID
c. MULDER
d. BOUNCER
31
Experiment II: Efficiency of Query Plans
32
Existing engines can
generate executable
plans by chance.
Experiment II: Efficiency of Query Plans
33
Existing engines can
generate executable
plans by chance.
BOUNCER always
generates valid plans
BOUNCER valid plans
can be more efficient
than other plans
Conclusions and Lessons Learned
34
Sources and data privacy policies can
be described in terms of PRDF-MTS.
BOUNCER is a privacy-aware
federated query engine
Conclusions and Lessons Learned
35
Sources and data privacy policies can
be described in terms of PRDF-MTS.
BOUNCER is a privacy-aware
federated query engine
Enforcing data privacy and access
control is costly
Efficient and valid plans can be
identified
Conclusions and Lessons Learned
36
Sources and data privacy policies can
be described in terms of PRDF-MTS.
BOUNCER is a privacy-aware
federated query engine
Enforcing data privacy and access
control is costly
Efficient and valid plans can be
identified
BOUNCER always identifies valid plans
and can outperform existing
federated SPARQL engines
CONTACT
Kemele M. Endris
Forschungszentrum L3S
Leibniz Universität Hannover
Welfengarten 1B
30167, Hannover, Germany
email: endris@L3S.de
phone: +49-151-762-14695
Prof. (Uni.Simon Bolivar) Dr. Maria-Esther Vidal
Scientific Data Management
Technische Informationsbibliothek (TIB)
Welfengarten 1B
30167, Hannover, Germany
email: Maria.Vidal@tib.eu
phone: +49-115-762-14690
wdaqua.eu
37
Thank you for your
attention! Innovation Training
Network (ITN)

More Related Content

What's hot

GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005
GenomeInABottle
 
PubChem for drug discovery in the age of big data and artificial intelligence
PubChem for drug discovery in the age of big data and artificial intelligencePubChem for drug discovery in the age of big data and artificial intelligence
PubChem for drug discovery in the age of big data and artificial intelligence
Sunghwan Kim
 
Jan2016 horizon GIAB
Jan2016 horizon GIABJan2016 horizon GIAB
Jan2016 horizon GIAB
GenomeInABottle
 
Aug2013 illumina platinum genomes
Aug2013 illumina platinum genomesAug2013 illumina platinum genomes
Aug2013 illumina platinum genomes
GenomeInABottle
 
Giab aug2015 intro and update 150821.pptx
Giab aug2015 intro and update 150821.pptxGiab aug2015 intro and update 150821.pptx
Giab aug2015 intro and update 150821.pptx
GenomeInABottle
 
Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biolog
Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biologHowe et al. - 2015 - BioAssay Research Database (BARD) chemical biolog
Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biolog
Eleanor Howe
 
Aug2015 horizon diagnostics
Aug2015 horizon diagnosticsAug2015 horizon diagnostics
Aug2015 horizon diagnostics
GenomeInABottle
 
Bio Scope
Bio ScopeBio Scope
Bio Scope
Startup
 
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
GenomeInABottle
 
Tools for Using NIST Reference Materials
Tools for Using NIST Reference MaterialsTools for Using NIST Reference Materials
Tools for Using NIST Reference Materials
GenomeInABottle
 
Martin Ringwald, Mouse Gene Expression DB, fged_seattle_2013
Martin Ringwald, Mouse Gene Expression DB, fged_seattle_2013Martin Ringwald, Mouse Gene Expression DB, fged_seattle_2013
Martin Ringwald, Mouse Gene Expression DB, fged_seattle_2013
Functional Genomics Data Society
 
Cshl minseqe 2013_ouellette
Cshl minseqe 2013_ouelletteCshl minseqe 2013_ouellette
Cshl minseqe 2013_ouellette
Functional Genomics Data Society
 
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...
Chris Southan
 
Jan2016 bina giab
Jan2016 bina giabJan2016 bina giab
Jan2016 bina giab
GenomeInABottle
 
Enriching Scholarship Personal Genomics presentation
Enriching Scholarship Personal Genomics presentationEnriching Scholarship Personal Genomics presentation
Enriching Scholarship Personal Genomics presentation
University of Michigan Taubman Health Sciences Library
 
Genome in a Bottle
Genome in a BottleGenome in a Bottle
Genome in a Bottle
Genome Reference Consortium
 
The Transforming Genetic Medicine Initiative (TGMI)
The Transforming Genetic Medicine Initiative (TGMI)The Transforming Genetic Medicine Initiative (TGMI)
The Transforming Genetic Medicine Initiative (TGMI)
Genome Reference Consortium
 
2016 ashg giab poster
2016 ashg giab poster2016 ashg giab poster
2016 ashg giab poster
GenomeInABottle
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
GenomeInABottle
 
Patent chemisty big bang: utilities for SMEs
Patent chemisty big bang: utilities for SMEsPatent chemisty big bang: utilities for SMEs
Patent chemisty big bang: utilities for SMEs
Chris Southan
 

What's hot (20)

GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005
 
PubChem for drug discovery in the age of big data and artificial intelligence
PubChem for drug discovery in the age of big data and artificial intelligencePubChem for drug discovery in the age of big data and artificial intelligence
PubChem for drug discovery in the age of big data and artificial intelligence
 
Jan2016 horizon GIAB
Jan2016 horizon GIABJan2016 horizon GIAB
Jan2016 horizon GIAB
 
Aug2013 illumina platinum genomes
Aug2013 illumina platinum genomesAug2013 illumina platinum genomes
Aug2013 illumina platinum genomes
 
Giab aug2015 intro and update 150821.pptx
Giab aug2015 intro and update 150821.pptxGiab aug2015 intro and update 150821.pptx
Giab aug2015 intro and update 150821.pptx
 
Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biolog
Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biologHowe et al. - 2015 - BioAssay Research Database (BARD) chemical biolog
Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biolog
 
Aug2015 horizon diagnostics
Aug2015 horizon diagnosticsAug2015 horizon diagnostics
Aug2015 horizon diagnostics
 
Bio Scope
Bio ScopeBio Scope
Bio Scope
 
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
 
Tools for Using NIST Reference Materials
Tools for Using NIST Reference MaterialsTools for Using NIST Reference Materials
Tools for Using NIST Reference Materials
 
Martin Ringwald, Mouse Gene Expression DB, fged_seattle_2013
Martin Ringwald, Mouse Gene Expression DB, fged_seattle_2013Martin Ringwald, Mouse Gene Expression DB, fged_seattle_2013
Martin Ringwald, Mouse Gene Expression DB, fged_seattle_2013
 
Cshl minseqe 2013_ouellette
Cshl minseqe 2013_ouelletteCshl minseqe 2013_ouellette
Cshl minseqe 2013_ouellette
 
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...
 
Jan2016 bina giab
Jan2016 bina giabJan2016 bina giab
Jan2016 bina giab
 
Enriching Scholarship Personal Genomics presentation
Enriching Scholarship Personal Genomics presentationEnriching Scholarship Personal Genomics presentation
Enriching Scholarship Personal Genomics presentation
 
Genome in a Bottle
Genome in a BottleGenome in a Bottle
Genome in a Bottle
 
The Transforming Genetic Medicine Initiative (TGMI)
The Transforming Genetic Medicine Initiative (TGMI)The Transforming Genetic Medicine Initiative (TGMI)
The Transforming Genetic Medicine Initiative (TGMI)
 
2016 ashg giab poster
2016 ashg giab poster2016 ashg giab poster
2016 ashg giab poster
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
Patent chemisty big bang: utilities for SMEs
Patent chemisty big bang: utilities for SMEsPatent chemisty big bang: utilities for SMEs
Patent chemisty big bang: utilities for SMEs
 

Similar to BOUNCER: A Privacy-aware Query Processing Over Federations of RDF Datasets

Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
Bioinformatics and Computational Biosciences Branch
 
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
Dr. Haxel Consult
 
Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021
Sanjay Padhi, Ph.D
 
Building a Network of Interoperable and Independently Produced Linked and Ope...
Building a Network of Interoperable and Independently Produced Linked and Ope...Building a Network of Interoperable and Independently Produced Linked and Ope...
Building a Network of Interoperable and Independently Produced Linked and Ope...
Michel Dumontier
 
DisGeNET: A discovery platform for the dynamical exploration of human disease...
DisGeNET: A discovery platform for the dynamical exploration of human disease...DisGeNET: A discovery platform for the dynamical exploration of human disease...
DisGeNET: A discovery platform for the dynamical exploration of human disease...
Núria Queralt Rosinach
 
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and PresentToward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
Tim Williams
 
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning ModelsMining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Sean Ekins
 
FAIRness Assessment of the Library of Integrated Network-based Cellular Signa...
FAIRness Assessment of the Library of Integrated Network-based Cellular Signa...FAIRness Assessment of the Library of Integrated Network-based Cellular Signa...
FAIRness Assessment of the Library of Integrated Network-based Cellular Signa...
Kathleen Jagodnik
 
OpenTox Europe 2013
OpenTox Europe 2013OpenTox Europe 2013
OpenTox Europe 2013
Alejandra Gonzalez-Beltran
 
Open science in RIKEN-KI doctorial course on March 20, 2019
Open science in RIKEN-KI doctorial course on March 20, 2019Open science in RIKEN-KI doctorial course on March 20, 2019
Open science in RIKEN-KI doctorial course on March 20, 2019
Takeya Kasukawa
 
Variant analysis and whole exome sequencing
Variant analysis and whole exome sequencingVariant analysis and whole exome sequencing
Variant analysis and whole exome sequencing
Bioinformatics and Computational Biosciences Branch
 
Metadata-based tools at the ENCODE Portal
Metadata-based tools at the ENCODE PortalMetadata-based tools at the ENCODE Portal
Metadata-based tools at the ENCODE Portal
ENCODE-DCC
 
Semantics and linked data at astra zeneca
Semantics and linked data at astra zenecaSemantics and linked data at astra zeneca
Semantics and linked data at astra zeneca
Kerstin Forsberg
 
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...
Tom Plasterer
 
Promiscuous patterns and perils in PubChem and the MLSCN
Promiscuous patterns and perils in PubChem and the MLSCNPromiscuous patterns and perils in PubChem and the MLSCN
Promiscuous patterns and perils in PubChem and the MLSCN
Jeremy Yang
 
The Role of Metadata in Reproducible Computational Research
The Role of Metadata in Reproducible Computational ResearchThe Role of Metadata in Reproducible Computational Research
The Role of Metadata in Reproducible Computational Research
Jeremy Leipzig
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theory
C. Tobin Magle
 
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
David Peyruc
 
2016 Standardization of Laboratory Test Coding - PHI Conference
2016 Standardization of Laboratory Test Coding - PHI Conference2016 Standardization of Laboratory Test Coding - PHI Conference
2016 Standardization of Laboratory Test Coding - PHI Conference
Megan Sawchuk
 
Opening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs apiOpening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs api
Chris Evelo
 

Similar to BOUNCER: A Privacy-aware Query Processing Over Federations of RDF Datasets (20)

Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
 
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
 
Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021
 
Building a Network of Interoperable and Independently Produced Linked and Ope...
Building a Network of Interoperable and Independently Produced Linked and Ope...Building a Network of Interoperable and Independently Produced Linked and Ope...
Building a Network of Interoperable and Independently Produced Linked and Ope...
 
DisGeNET: A discovery platform for the dynamical exploration of human disease...
DisGeNET: A discovery platform for the dynamical exploration of human disease...DisGeNET: A discovery platform for the dynamical exploration of human disease...
DisGeNET: A discovery platform for the dynamical exploration of human disease...
 
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and PresentToward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
 
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning ModelsMining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
 
FAIRness Assessment of the Library of Integrated Network-based Cellular Signa...
FAIRness Assessment of the Library of Integrated Network-based Cellular Signa...FAIRness Assessment of the Library of Integrated Network-based Cellular Signa...
FAIRness Assessment of the Library of Integrated Network-based Cellular Signa...
 
OpenTox Europe 2013
OpenTox Europe 2013OpenTox Europe 2013
OpenTox Europe 2013
 
Open science in RIKEN-KI doctorial course on March 20, 2019
Open science in RIKEN-KI doctorial course on March 20, 2019Open science in RIKEN-KI doctorial course on March 20, 2019
Open science in RIKEN-KI doctorial course on March 20, 2019
 
Variant analysis and whole exome sequencing
Variant analysis and whole exome sequencingVariant analysis and whole exome sequencing
Variant analysis and whole exome sequencing
 
Metadata-based tools at the ENCODE Portal
Metadata-based tools at the ENCODE PortalMetadata-based tools at the ENCODE Portal
Metadata-based tools at the ENCODE Portal
 
Semantics and linked data at astra zeneca
Semantics and linked data at astra zenecaSemantics and linked data at astra zeneca
Semantics and linked data at astra zeneca
 
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...
 
Promiscuous patterns and perils in PubChem and the MLSCN
Promiscuous patterns and perils in PubChem and the MLSCNPromiscuous patterns and perils in PubChem and the MLSCN
Promiscuous patterns and perils in PubChem and the MLSCN
 
The Role of Metadata in Reproducible Computational Research
The Role of Metadata in Reproducible Computational ResearchThe Role of Metadata in Reproducible Computational Research
The Role of Metadata in Reproducible Computational Research
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theory
 
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
 
2016 Standardization of Laboratory Test Coding - PHI Conference
2016 Standardization of Laboratory Test Coding - PHI Conference2016 Standardization of Laboratory Test Coding - PHI Conference
2016 Standardization of Laboratory Test Coding - PHI Conference
 
Opening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs apiOpening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs api
 

Recently uploaded

一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
yuvarajkumar334
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
bmucuha
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
wyddcwye1
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024
facilitymanager11
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
sameer shah
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
xclpvhuk
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
VyNguyen709676
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
bmucuha
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 

Recently uploaded (20)

一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 

BOUNCER: A Privacy-aware Query Processing Over Federations of RDF Datasets

  • 1. wdaqua.eu BOUNCER: A Privacy-aware Query Processing Over Federations of RDF Datasets Kemele M. Endris, Zuhair Almhithawi, Ioanna Lytra, Maria-Esther Vidal, Sören Auer DEXA 2018 - September 3 - 6, 2018 Regensburg, Germany Innovation Training Network (ITN)
  • 2. 2 Data privacy Individual’s rights of preserving access control to personal data, e.g., health information, genomics, and political and religious preferences.
  • 3. 3 Data privacy Individual’s rights of preserving access control to personal data, e.g., health information, genomics, and political and religious preferences. http://www.oneadvisory.london/gdpr-eu-general-data-protection-regulation/
  • 4. Motivating Example 4 Patient Mutation GeneLiquid Biopsy 1 https://commons.wikimedia.org/wiki/File:Green_hospital_icon.svg Clinical Records Genomic data
  • 5. RDF Graphs 5 Mutation Gene 15604253 22980975 ‘c.2369C>T’ ‘7:55181378’ ‘p.T790M’ ‘ENST00000 275493’ ‘EGFR’ :located_in :mentioned_in :acc_num :mentioned_in :mutation_aa :geneName:mutation_loci:mutation_cds Public Data!
  • 6. RDF Graphs 6 Patient Liquid Biopsy :name :gender :smoking :egfr_mutated :address :birthdate :biopsy :targetTotal :mutation_aa Controlled Data! Local Join Permitted! No Operation Permitted! ‘Einstein str. 1, 10100, Germany’ ‘Alice Bob’ ‘Female’ ‘03/03/1923’ 0.63% ‘p.T790M’‘True’‘False’ Sensitive Data
  • 7. SPARQL Query 7 Patient Mutation GeneLiquid Biopsy 1 https://commons.wikimedia.org/wiki/File:Green_hospital_icon.svg Clinical Records Genomic data SPARQL Query PREFIX ex: <http://example.com/vocab/> SELECT DISTINCT ?mutation ?loci ?pubmedid ?accNum WHERE { ?lbiop ex::mutation_aa ?mutation . ?lbiop ex:targetTotal ?targetTotal . ?patient ex:biopsy ?lbiop . ?patient ex:smoking “False” . ?patient ex:egfr_mutated “True” . ?cmut ex:mutation_aa ?mutation . ?cmut ex:mentioned_in ?pubmedid . ?cmut ex:mutation_loci ?loci . ?cmut ex:located_in ?gene . ?gene ex:gene_name “EGFR” . ?gene ex:acc_num ?accNum . } 1 2 3 4 5 6 7 8 9 10 11 Genes associated with non-smoking lung cancer patients whose liquid biopsy has been studied for somatic mutations that involve EGFR gene amplification: ● Mutation name ● Genomic coordinates of the mutations ● Pubmed ID ● Accession numbers
  • 8. Different Execution Plans 8 PREFIX ex: <http://example.com/vocab/> SELECT DISTINCT ?mutation ?loci ?pubmedid ?accNum WHERE { ?lbiop ex::mutation_aa ?mutation . ?lbiop ex:targetTotal ?targetTotal . ?patient ex:biopsy ?lbiop . ?patient ex:smoking “False” . ?patient ex:egfr_mutated “True” . ?cmut ex:mutation_aa ?mutation . ?cmut ex:mentioned_in ?pubmedid . ?cmut ex:mutation_loci ?loci . ?cmut ex:located_in ?gene . ?gene ex:gene_name “EGFR” . ?gene ex:acc_num ?accNum . } 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 10 11 10 11 6 7 8 9 1 23 4 5 Running the Query over different Federated Query Engines FedX ANAPSID MULDER Dependent JOIN Independen t JOIN
  • 9. Invalid Executions of Query Plans 9 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 10 11 10 11 6 7 8 9 1 23 4 5 FedX ANAPSID MULDER Violates AC policy of hospital by extracting data about patient and biopsy! Violates AC policy of hospital by extracting data about patient and biopsy! Violates AC policy of hospital by extracting data about patient!
  • 10. Agenda 1. BOUNCER: A privacy-aware federated query engine over SPARQL endpoints 2. Empirical Evaluation 3. Conclusions and Lessons Learned 10
  • 12. BOUNCER: Architecture BOUNCER describes data sources using Privacy-aware RDF Molecule Templates (PRDF-MTs) 12
  • 13. BOUNCER: Architecture BOUNCER describes data sources using Privacy-aware RDF Molecule Templates (PRDF-MTs) 13
  • 14. Privacy-aware RDF Molecule Templates (PRDF-MTs) • PRDF-MTs characterize RDF molecules that share same characteristics, e.g., ex:geneName, ex:mentioned_in • PRDF-MTs are defined in terms of: • RDF Class • Set of predicates • Set of access control operations associated with the predicates • Intra and inter links between PRDF-MTs • API interfaces to access the data, e.g., SPARQL endpoint 14
  • 15. Privacy-aware RDF Molecule Templates (PRDF-MTs) 15 Mutation Gene ex:mentioned_in ex:mutation_cds ex:mutation_loci ex:mutation_aa ex:acc_num ex:geneName ex:located_in Projecting Values is allowed! (Values are Public)
  • 16. Privacy-aware RDF Molecule Templates (PRDF-MTs) 16 Patient Liquid Biopsy ex:biopsy ex:address ex:name ex:gender ex:birthdate ex:targetTotal ex:mutation_aaex:egfr_mutatedex:smoking JoinLocal No Operation Permitted! Join operation permitted Locally!
  • 17. Privacy-aware Operations 1. Join Local ○ Join operation allowed only at server (on premises of the publisher(owner)) 2. Join at Federation Mediator ○ Join operation allowed both at the server and at the mediator, i.e., the federation engine ○ Values can not be projected out of the mediator 3. Project(public) ○ The values of the property p can be projected from the dataset to the user. ○ Join operation is allowed both at the server and mediator 17 Access control policy of a dataset is composed of privacy-aware operations that can be performed on properties in the dataset.
  • 18. Privacy-aware Source Selection & Decomposition 1. BOUNCER creates a query decomposition with service graph patterns (SGPs) of star-shaped subqueries built according to PRDF-MTs ○ Star-shaped subqueries (SSQs) set of triple patterns with the same subject 2. Respect the privacy and access control policy of selected sources ○ Minimize execution time and maximize answer completeness by selecting only relevant sources that allow for at least one access operation 18
  • 19. Star-shaped Subqueries (SSQs) 19 PREFIX ex: <http://example.com/vocab/> SELECT DISTINCT ?mutation ?loci ?pubmedid ?accNum WHERE { ?lbiop ex::mutation_aa ?mutation . ?lbiop ex:targetTotal ?targetTotal . ?patient ex:biopsy ?lbiop . ?patient ex:smoking “False” . ?patient ex:egfr_mutated “True” . ?cmut ex:mutation_aa ?mutation . ?cmut ex:mentioned_in ?pubmedid . ?cmut ex:mutation_loci ?loci . ?cmut ex:located_in ?gene . ?gene ex:gene_name “EGFR” . ?gene ex:acc_num ?accNum . } 1 2 3 4 5 6 7 8 9 10 11 SSQ1 SSQ2 SSQ3 SSQ4 3 4 5 1 2 9 11 ex:biopsy ex:mutarion_aa ex:located_in 6 7 8 10
  • 20. BOUNCER: Privacy-aware Source Selection 20 3 4 5 1 2 ex:biopsy ex:mutarion_aa ex:located_in 9 6 7 8 11 10 Patient Liquid Biopsy SSQs PRDF-MT Mutation Gene Source AC Operations 3 4 5 JoinLocal JoinLocal JoinLocal 1 2 JoinLocal JoinLocal 10 11 Public Public 6 7 8 9 Public Public Public Public
  • 21. BOUNCER: Privacy-aware Source Selection 21 SSQs PRDF-MT Source AC Operations 3 4 5 ex:mutarion_aa Patient & Liquid Biopsy Mutation & Gene 3 4 5 JoinLocal JoinLocal JoinLocal 1 2 JoinLocal JoinLocal 10 11 Public Public 6 7 8 9 Public Public Public Public 9 6 7 8 1110 1 2 Joins can be performed locally
  • 22. BOUNCER: Privacy-aware Query Planning ● BOUNCER applies a greedy heuristic based approach to generate a bushy plan ○ leaves correspond to Star Shaped Queries (SSQs) ● BOUNCER finds a valid plan that respects the privacy-policy of the data sources. ○ Selects physical operators based on the access policy of the data sources 22
  • 23. 23 SSQs PRDF-MT Source AC Operations Operator respects the privacy policy of binding values for ex:muation_aa Patient & Liquid Biopsy 3 4 5 1 2 Mutation & Gene 9 6 7 8 1110 Dependent JOINBOUNCER: Privacy-aware Query Planning
  • 25. Experimental Setup Research Questions (RQs): RQ1) Does privacy-aware enforcement employed during source selection, query decomposition, and planning impact query execution time? RQ2) Can privacy-aware policies be used to identify query plans that enhance execution time and answer completeness? 25
  • 26. Experimental Setup Benchmark: The Berlin SPARQL Benchmark (BSBM) dataset ■ 200M triples ■ 14 queries Metrics ■ Execution time: elapsed time between the submission of a query to an engine and the delivery of the answers (timeout: 300 sec) ■ Throughput: number of answers produced per second 26
  • 27. Experiment I: Decomposition and Planning Time ● Goal: Assess impact of access control enforcement during source selection, decomposition and planning on the overall query execution time. (RQ1) ● All properties in the federation are public, ● MULDER and BOUNCER are compared on: a. Decomposition and Planning time. b. Query execution time 27
  • 28. Experiment I: Decomposition and Planning Time 28 BOUNCER consumes more time in query decomposition and planning
  • 29. Experiment I: Decomposition and Planning Time 29 BOUNCER consumes more time in query decomposition and planning Enforcing privacy and access control is costly
  • 30. Experiment I: Overall Execution Time 30 BOUNCER query plans speed up execution time Effective and valid plans can be identified Simple query
  • 31. Experiment II ● Goal: Access impact of privacy-aware query plans ● Privacy policy: a. Local Join: all properties of Person, Producer, Product, and ProductFeature b. Project (Public): all properties of Offer, Review, ProductType, and Vendor ● Federated Query Engines a. FedX b. ANAPSID c. MULDER d. BOUNCER 31
  • 32. Experiment II: Efficiency of Query Plans 32 Existing engines can generate executable plans by chance.
  • 33. Experiment II: Efficiency of Query Plans 33 Existing engines can generate executable plans by chance. BOUNCER always generates valid plans BOUNCER valid plans can be more efficient than other plans
  • 34. Conclusions and Lessons Learned 34 Sources and data privacy policies can be described in terms of PRDF-MTS. BOUNCER is a privacy-aware federated query engine
  • 35. Conclusions and Lessons Learned 35 Sources and data privacy policies can be described in terms of PRDF-MTS. BOUNCER is a privacy-aware federated query engine Enforcing data privacy and access control is costly Efficient and valid plans can be identified
  • 36. Conclusions and Lessons Learned 36 Sources and data privacy policies can be described in terms of PRDF-MTS. BOUNCER is a privacy-aware federated query engine Enforcing data privacy and access control is costly Efficient and valid plans can be identified BOUNCER always identifies valid plans and can outperform existing federated SPARQL engines
  • 37. CONTACT Kemele M. Endris Forschungszentrum L3S Leibniz Universität Hannover Welfengarten 1B 30167, Hannover, Germany email: endris@L3S.de phone: +49-151-762-14695 Prof. (Uni.Simon Bolivar) Dr. Maria-Esther Vidal Scientific Data Management Technische Informationsbibliothek (TIB) Welfengarten 1B 30167, Hannover, Germany email: Maria.Vidal@tib.eu phone: +49-115-762-14690 wdaqua.eu 37 Thank you for your attention! Innovation Training Network (ITN)