SlideShare a Scribd company logo
1 of 44
PR
SM
PRiSM Lab. - UMR 8144
Privacy Preserving SQL Query
Execution on Distributed Data
Quoc-Cuong To, Benjamin Nguyen, Philippe Pucheral
SMIS Project
LaHDAK Seminar
Orsay
4th
March 2014
Université de Versailles et St-Quentin
INRIA Rocquencourt
CNRS
PR
SMPRiSM Lab. - UMR 8144
PART I
The New Oil
I. The New Oil
II. Trusted Cells
III. Global SQL Queries
IV. Cost Model and
Experiments
V. Conclusion
PR
3
Mass-generation of
(personal) data
Data sources have mostly turned digital
Analog processes
e.g., photography, films
Paper-based interactions
e.g., banking, e-administration
Communications
e.g., email, SMS, MMS, Skype
Where is your personal data? … In data centers
112 new emails per day  Mail servers
65 SMS sent per day  Telcos
800 pages of social data  Social networks
Web searches, list of purchases  google, amazon
People recording
People listnening
St Peter's Place, Roma
WHY ?Is this
a problem ?
/ 41
Everything is free… 
Information
Extraction
PR
4
Personal data is the new oil
Is this good news ?
 $2 billion a year spend by US companies
on third-party data about individuals
(Forrester Report)
 $44.25 is the estimated return on $1
invested in email marketing (oil is up to 0.5$/yr)
High Market Value Companies
 Facebook: value / #accounts ≈ 50$
 Google: $38 billion business sells ads based on how people search the Web
 Amazon (knows purchase intent), mail order systems companies (gmail), loyalty
programs (supermarkets), banks & insurrance, employement market (linkedIn,
viadeo), travel & transportation (voyages-sncf), the « love » market (meetic), etc.
/41
PR
5
Personal data is the new oil
How would oil companies behave ?
• Exploit your oil field for free  Know all about you
• Offer “extra” services  Refine their knowledge
• Provide real services to their paying customers
(e.g. advertisement and profiling, location tracking and spying, …)
In other words : your personal data would be
processed by sophisticated data refineries…
REGARDLESS OF YOUR PRIVACY !
It’s the business model…
… or bad news ?
/41
Their choice
Your choice
PR
6
Is the current centralised model good wrt privacy
protection?
Intrinsic problem #1: personal data is exposed to sophisticated attacks
–High benefits to successful hack
–One person negligence may affect millions
Intrinsic problem #2: personal data is hostage of sudden privacy changes
–Centralised administration of data means delegation of control
–This leads to regular changes, with application (and business)
evolution, with mergers and acquisition, etc. (e.g Facebook 2012)
Increasing security is only a partial solution since does not solve those
intrinsic limitations
E.g., TrustedDB [BS12] proposes tamper-resistant hardware to secure
outsourced centralized databases.
/41
PR
7
A New Hope
A Personal Data Ecosystem…
… built around user-centricity and trust,
achieved through a decentralized architecture
7
THE TRUSTED CELL !
I want my
privacy
back !!
/41
Our goals :
 Preserve current USER functionalities
 Hinder uncontrolled data exploitation & privacy violations
Our targets :
 General Data Management Applications : SQL
 “Low cost” solutions (i.e. acceptable by general public)
PR
SMPRiSM Lab. - UMR 8144
PART II
Trusted Cells
I. The New Oil
II. Trusted Cells
III. Global SQL Queries
IV. Cost Model and
Experiments
V. Conclusion
PR
9
The Secure (Trusted) Personal Data Server Approach
[AAB+10]
9
Personal database is
• Well-organized
• Tamper resistant
• Controlled by the owner
(sharing, retention, audit)
• Accessible in disconnected
mode
Personal
data
Hospital
Doctor’ s office
My bank
My employer
My telco Private
application
Secure
multi-actors
application
Secure
global queries
External
application
e.g., epidemiological study
e.g., medical care
Doctor
Nurse
e.g., budget optimization
e.g., financial help
Bob
Approach characteristics :
• Based on tamper-resistant HW
• Well Structured World (R-DB, limited apps)
• Uniform equipment
TRUSTED CELL
FLASH
(GB size)
RAM
FLASH
NOR
CPU Crypto
PDS generic code
Relational DBMS
Operating System
Certified Encrypted
Personal
Database
/41
PR
10
Why trust personal secure HW solutions?
1. Users store their own data
 minimize abusive usage
2. Auto-administered platform
 no DBA attack (even by user)
3. Enforce privacy principles for
externalized (shared) data
 best if the recipient of the data is
another TC
4. Tamper-resistance + certified
code/secure execution + single
user + physical access needed
 ratio cost/benefit of an attack is
very high
Tamperresistance
Gemalto secure token
SMIS token (ZED)
Trust Zone architecture
Dedicated HW device
PC ? (social trust / open
source)
/41
PR
11
The Trusted Cell Asymmetric Architecture
Durability,
Availability
Secure
Computation
Export Data
11
TC asymmetric architecture
Built using Secure Portable Tokens as Trusted Cells (called here Trusted Data Server or TDS) /
Cloud as Supporting Server Infrastructure (SSI).
Challenges :
Local (Embedded) data management (not my work : Anciaux, Bouganim, Pucheral et al.)
Global querying (Part III)
Data export management (MinExp Project with CG78 & LIX)
HIGH POWER / AVAILABILITY
LOW / NO TRUST
LOW POWER / AVAILABILITY
HIGH TRUST
ASYMMETRIC
Encrypted
Private Data Generated
(e.g. sensor)
/41
PR
SMPRiSM Lab. - UMR 8144
PART III
Global SQL Queries on the Asymmetric
Architecture
I. The New Oil
II. Trusted Cells
III. Global SQL Queries
IV. Cost Model and
Experiments
V. Conclusion
PR
13
Example Trusted Cell : a Trusted Data Server (TDS)
Token Characteristics :
• High security:
• High ratio Cost/Benefit of an attack;
• Secure against its owner;
• Modest computing resources (~10Kb of RAM, 50MHz
CPU);
• Low availability: physically controlled by its owner;
connects and disconnects at it will
13
How to compute global queries over decentralized
personal data stores while respecting users’ privacy?
Authorized
Querier
Average Salary
in Orsay
Unauthorized
Querier
PR
14
TC can be :
Unbreakable (honest)
Broken (Weakly Malicious)
nfrastructure (SSI) can be :
Honest but curious (Semi-honest)
Weakly-Malicious (Covert Adversary
= does not want to be detected)
Secure Global Computation on TCs
PROBLEM :
How to perform global queries on the asymmetric
architecture? (i.e. using data from many/all cells)
The « classical » problem of Secure Global Computation (e.g SMC) is
more general and makes no trust assumption.
THREAT MODEL :
/41
HBC + Unbreakable  “simple protocols” presented here (EDBT’14 [TNP14])
WM + Broken  Must be prevented ! (via security primitives) see [ANP13]
PR
15
Is this a new problem ?
Several approaches are possible to securely perform global computations:
1. Use only an untrusted server/cloud/P2P and use generic (and costly) algorithms.
(e.g. Secure Multi-Party Computing [Yao82, GMW87, CKL06], fully homomorphic
encryption [Gent09]) Problem = COST
2. Use only an untrusted server/cloud/P2P and develop a specific algorithm for
each specific class of queries or applications. (e.g. DataMining Toolkit [CKV+02])
Problem = GENERICITY
3. Introduce a tangible element of trust, through the use of a trusted
component and develop a generic methodology to execute any centralized
algorithm in this context. ([Katz07, GIS+10, AAB+10])  Problem = TRUST
/41
PR
16
Hypothesis on Querier and SSI
Querier:
• Shares the secret key with TDSs (for encrypt the query & decrypt
result).
• Classical Access control policy (e.g. RBAC):
– Cannot get the raw data stored in TDSs (get only the final result)
– Can obtain only authorized views of the dataset ( do not care about inferential attacks)
Supporting Server Infrastructure:
• Doesn’t know query (so, attributes in GROUP BY clause) b/c query is
encrypted by Querier before sending to SSI.
• Has prior knowledge about data distribution.
• Honest-but-curious attacker: Frequency-based attack
– SSI matches the plaintext and ciphertext of the same frequency.
e.g. investigates remarkable (very high/low) frequencies in dataset distribution
(e.g., X is the only person with a given (high) age and still working and earning money → if I
find a group with only one member I can deduct that X participates in the dataset). 16
PR
17
Solution Overview
17
1) Query
Supporting Server
Infrastructure (SSI)
…
SELECT <attribute(s) and/or aggregate function(s)>
FROM <Table(s) / SPTs>
[WHERE <condition(s)>]
[GROUP BY <grouping attribute(s)>]
[HAVING <grouping condition(s)>]
[SIZE <size condition(s)>];
2) Collection and
Filtering phase
3) Aggregation phase
Stop condition: max #tuples or max time
John, 35K Mary, 43K Paul, 100K
SELECT age, AVG(salary)
FROM user
WHERE town = “Orsay”
GROUP BY age
HAVING MIN(salary) > 0
SIZE
4) Aggregate
Filtering phase
PR
18
Proposed Solutions
The main difficulty is with AGGREGATE QUERIES !!
Solutions vary depending on which kind of encryption is used, how
the SSI constructs the partitions, and what information is revealed to
the SSI.
• Secure aggregation solution
• Noise-based solutions
– random (white) noise
– noise controlled by the complementary domain
• Histogram-based solutions
We investigate these solutions along the directions of
performance and security. 18
PR
19
Secure Aggregation
19
Supporting Server
Infrastructure (SSI)
…
encrypts its data using
non-deterministic
encryption
Form partitions (fit resource of a TDS)
Hold partial aggregation (Gij,AGGk)
Querier
}
(25y, Orsay, 35K)
(#x3Z, aW4r)
(45y, Orsay, 43K) (53y, Paris, 100K)
Q: SELECT Age, AVG(Salary)
WHERE city = Orsay
GROUP BY Age
HAVING Min(Salary) > 0
($f2&, bG?3)
No answer ?
(#x3Z, aW4r)
($f2&, bG?3)
($&1z, kHa3)
…
(T?f2, s5@a)
(#i3Z, afWE)
(T?f2, s!@a)
($f2&, bGa3)
(#x3Z, aW4r)
($f2&, bG?3)
($&1z, kHa3)
(?i6Z, af~E)
(T?f2, s5@a)
(5f2A, bG!3)
(25, 35K)
(45, 43K)
(45, 37K)
(25, [35K,1])
(45, [40K,2])
(F!d2, s7@z)
(ZL5=, w2^Z)
Final Agg
(#f4R, bZ_a)
(Ye”H, fw
%g)
(@!fg, wZ4#)
(25, 29.5K)
(45, 43.7K)
…
Evaluate HAVING clause
Final Result
(#f4R, bZ_a)
(Ye”H, fw%g)
Qi= <EK1(Q),Cred,Size>
Decrypt Qi
Check AC rules
Decrypt Qi
Check AC rules
Decrypt Qi
Check AC rules
PR
20
Noise Based Protocols
Secure Aggregation Efficiency problem :
nDet_Enc on AG  SSI cannot gather tuples belonging to
the same group into same partition.
But :
Det_Enc on AG  frequency-based attack.
Idea :
Add noise (fake tuples) to hide distribution of AG.
How many fake tuples (nf) needed?  disparity in
frequencies among AG
– small nf: random noise
– big nf: white noise
– nf = n-1: controlled noise (n: AG domain cardinality)
Efficiency:
– Each TDS handles tuples belonging to one group (instead of large partial
aggregation as in SAgg)
– However, high cost of generating and processing the very large number
of fake tuples
PR
21
Nearly Equi-Depth Histogram Solution
1. Distribution of AG is discovered
and distributed to all TDSs.
2. TDS allocates its tuple to
corresponding bucket.
3. TDS send to SSI:
{h(bucketId),nDet_Enc(tuple)}
Consequences :
21
 We do not generate & process too
many fake tuples
 We do not handle too large partial
aggregation
True Distribution Nearly equi-depth histogram
Problem : Distribution must be discovered
 This can be done “offline” using secure
aggregation !
PR
22
Information Exposure Analysis (DCJP+03)
22
To measure Information Exposure, we consider the probability that an attacker (here the Honnest but Curious SSI) can reconstruct the plaintext table (or part of the table) using the
encrypted table and his prior knowledge about global distributions of plaintext attributes.
Information Exposure is noted :
• n is the number of tuples
• k is the number of attributes
• ICi,j is the value in row i and column j of the inverse cardinality ( = 1/number of plaintext values that could correspond)
• Nj is the number of distinct plaintext values in the global distribution of attribute in column j (i.e., Nj ≤ n).
,
1 1
1 kn
i j
i j
IC
n
ε
= =
= ∑∏
PR
23
23
_
1 1 1
1 1
1/
k kn
S Agg j
i j jj
N
n N
ε
= = =
= =∑∏ ∏SAgg: ICi,j = 1/Nj for all i,j
•n: the number of tuples,
•k: the number of attributes,
•ICi,j : IC for row i and column j
•Nj: the number of distinct plaintext
values in the global distribution of
attribute in column j (i.e., Nj ≤ n)
_
1
min( ) 1/
k
ED Hist j
j
Nε
=
= ∏
EDHist: requires finding all possible partitions of
the plaintext values such that the sum of their
occurrences is the cardinality of the hashed
value: NP-Hard multiple subset sum problem
Noise_based & ED_Hist have a uniform distribution of the AG: ɛED_Hist = ɛNoise_based
Plaintext: _
1 1
1
1 1
kn
P Text
i jn
ε
= =
= =∑∏ ɛS_Agg ≤ ɛED_Hist =ɛNoise_based <1
Information Exposure Analysis (Damiani et al. CCS 2003)
PR
SMPRiSM Lab. - UMR 8144
PART IV
Cost Model and experiments
I. The New Oil
II. Trusted Cells
III. Global SQL Queries
IV. Cost Model and
Experiments
V. Conclusion
PR
25
Unit Test Calibration
25
Internal time consumption
Eval Board
•32 bit RISC CPU: 120 MHz
•Crypto-coprocessor: AES, SHA
•64KB RAM, 1GB NAND-Flash
•USB full speed: 12 Mbps
} SMIS developped token (ZED electronics)
Same technical characteristics
Price = 50 EUR (small series)
PR
26
Parameters for cost model
Dataset size Ttuple : varies from 5 to 65 million
Number of groups G : varies from 1 to 106
Number of TDSs participating in the computation as a percentage of all TDSs
connected at a given time Ttds : varies from 1% to 100%).
We fix two parameters and vary the other, measuring : execution time,
parallelism of the protocol, total load, maximum load on one TDS
When the parameters are fixed :
Ttuple =106
, G=103
, % of TDS connected = 10% of Ttuple.
We also compute and use the optimal value for all reduction factors as
well as for.
In the figures, we plot two curves for Rnf_Noise protocols RN (nf = 2) and
WN (nf = 1000) to capture the impact of the ratio of fake tuples.
PR
27
EXECUTION TIME
27
Ttuple=106
; G=1-106
Ttuple=5.106
- 35.106
; G=1000
Naïve, noise-based, ED&EW:
•G increases, Ttuple fixed  Number of tuples in each
group decreases
•Depend only on the total number of tuples in each group
(because all groups are processed in parallel)  exeTime
decreases when G increases.
Secure Count:
•G increases  time for processing the big partial
aggregation increases accordingly.
•Cannot fully deploy the parallel computation (cannot
divide each group for TDSs in parallel, each TDS has to
handle the whole G groups)  exeTime increases
Naïve, RN, ED&EW:
•Ttuple increases, Ttds increases accordingly  not
much changes
Secure Count:
• Number of recursive steps increases when Ttuple
increases.  exeTime increase
WN,CN:
• Number of fake tuples increases
linearly with the number of true tuples.
 exeTime also increases linearly to
handle the fake & true tuples
PR
28
NUMBER OF PARTICIPATING TDSS
28
Ttuple=106
; G=1-106
Ttuple=5.106
- 35.106
; G=1000
Secure Count:
•G increases  level of convergence is low & the size of
each aggregation is big
 need less participating TDSs to build the aggregations
to gain the high convergence level
Other solutions:
• Since each group is processed in parallel and
independently  when G increases, the level of
parallelism increases
 more TDSs are needed to participate in the parallel
computation
WN, CN:
• When true Ttuple increases, the fake tuples
increases as well  more TDSs are needed to
process fake tuples
Secure Count:
• Level of parallelism is less than other solutions
 needs least TDS
PR
29
TOTAL LOAD (NETWORK OVERHEAD)
29
Ttuple=106
; G=1-106
Ttuple=5.106
- 35.106
; G=1000
Noised-based:
• Highest load because of the fake tuples
• When G increases but Tpds does not change
number of tuples (both true and fake) do not change
total load is the same
Others:
Lower load since handle only true data
Noised-based:
• When true Ttuple increases, the
fake tuples increases linearly
total load is highest and
increases
PR
30
MAXIMUM LOAD
30
Ttuple=106
; G=1-106
Ttuple=5.106
- 35.106
; G=1000
Secure Count:
•When G increases, size of each aggregation is big
each PDS process bigger aggregation
•When G increases, number of participating PDSs decrease
 each participating PDS incurs higher load
Others:
•When G increases, number of participating PDSs decrease &
number of tuples in each group decreases
each PDS process less tuples  maxLoad decrease
WN, CN:
•Use all available PDSs
 maxLoad increases linearly when Ttuple
increases
Others:
when Ttuple increases, the number of
participating PDSs also increase accordingly
 in general, the maxLoad does not
increase too much
PR
31
AVERAGE LOAD
31
Ttuple=106
; G=1-106 Ttuple=5.106
- 35.106
; G=1000
Secure Count:
•Total load is unchanged but the number of participating
TDSs is reduced when G increases
 the average load increases.
WN,CN:
•High total load is the same & all PTpds=10^5 participate
in the computation
 every PDSs incur the same amount of load
Others:
•G increase, more participating PDSs & total load
unchanged  AvgLoad decreases
Although: TotalLoad(CN) > TotalLoad(SC)
PTpds(CN) >> PTpds(SC)
 AvgLoad(CN) < AvgLoad(SC)
PR
32
CONSUMED MEMORY
32
Actual RAM size of TDS
Noise-based:
•Need to store only 1 group regardless of G
 Require least RAM.
Histogram-based:
•Each PDS store h groups (h>1) regardless of G
 Require higher RAM
SC:
•Each PDS store all G groups
•When G increases, RAM needed increases
 Require highest RAM
•Exceed actual RAM’s size  future work
PR
33
AVERAGE TIME FOR PDS TO CONNECT
33
Ttuple=106
; G=1-106
Ttuple=5.106
- 35.106
; G=1000
Secure Count:
•The number of participating PDSs is reduced when G
increases
 the average time increases.
WN,CN:
•High total load is unchanged & all PTpds=10^5
participate in the computation
 every PDSs take the same amount of time to process
data
Others:
•G increase, more participating PDSs
 AvgTime decreases
High AvgTime:
•WN,CN: because of too many fake tuples
•SC: because of very few participating PDSs
PR
34
Theoretical Scalability
34
Tpds = 1%Ttuple Tpds = 10%Ttuple
Tpds = 100%Ttuple
Secure Count: has a (low) maximum
number of participants.
Others: WN have higher scalability than
others (in the sense that adding
participants count)
PR
35
Experimental Scalability
PR
36
COMPARISON WITH OTHER STATE-OF-THE-ART
METHODS
36
Hardware:
•Linux workstation;
•AMD Athlon-64 2Ghz processor;
•512 MB memory
•SC: depends mostly on G
(slightly on Ttuple)
•Others: not depends on G, but
mostly on Ttuple
Answering aggregation queries in a secure system model. (Ge & Zdonic, VLDB 2007)
DES: each value is decrypted and the computation is performed on the plaintext.
Server must have access to secret key & plaintext (violates security requirements)
Paillier: perform computation directly on the ciphertext using a secure
homomorphic encryption scheme: enc(a + b) = enc(a) + enc(b)
Server performs computation without having access to the secret key
or plaintext. In the end, ciphertext are passed back to the trusted
agent (i.e., Key Holder) to perform a final decryption and simple
calculation of the final result
PR
37
Metrics for the evaluation of the proposed solutions
37
Total Load
Average Time/Load
Query Response Time
Information Exposure
Throughput
Resource Variation
PR
38
Trade-off between criteria
38
Select ..
From ..
Where ..
Group By AG
G = card (AG)
Security:
S_Agg > ED_Hist
Performance:
G > 10:
ED_Hist > S_Agg
G <= 10:
ED_Hist < S_Agg
PR
SMPRiSM Lab. - UMR 8144
PART V
Conclusion and perspectives
I. The New Oil
II. Trusted Cells
III. Global SQL Queries
IV. Cost Model and
Experiments
V. Conclusion
PR
40
Short/Middle term research :
Data intensive Computing on an Asymmetric Architecture
SQL
Queries here do not have joins !
Take into account Malicious SSI / Broken Tokens
Field experiment on usability (with ISN)
Private/Secure MapReduce
Investigate compatibility of our protocols.
Develop new protocols.
Check performance !
XML management
Adapt the work on XQ2P (Butnaru, Gardarin, Nguyen) to the Trusted
Cells context.
Distributed Window Queries.
/41
PR
41
Promoting the Trusted Cells vision
Trusted Cells “Core”
Open hardware and software bundle : basic functionalities
Local DB
Distributed DB
NoSQL DB
 needed to develop PbD personal data management applications !
Promote an open source community around Trusted Cells.
UVSQ FabLab
Bring secure data management to the Versailles FabLab
Beyond Tamper Resistant HW
Results are useable even with lower trust elements.
Include social trust / reputation.
/41
PR
SMPRiSM Lab. - UMR 8144
QUESTIONS ?
42
PR
SMPRiSM Lab. - UMR 8144
43
PR
44

More Related Content

What's hot

Towards secure cloud data management
Towards secure cloud data managementTowards secure cloud data management
Towards secure cloud data management
ambitlick
 
Data attribute security and privacy in Collaborative distributed database Pub...
Data attribute security and privacy in Collaborative distributed database Pub...Data attribute security and privacy in Collaborative distributed database Pub...
Data attribute security and privacy in Collaborative distributed database Pub...
International Journal of Engineering Inventions www.ijeijournal.com
 
DISTRIBUTED SCHEME TO AUTHENTICATE DATA STORAGE SECURITY IN CLOUD COMPUTING
DISTRIBUTED SCHEME TO AUTHENTICATE DATA STORAGE SECURITY IN CLOUD COMPUTINGDISTRIBUTED SCHEME TO AUTHENTICATE DATA STORAGE SECURITY IN CLOUD COMPUTING
DISTRIBUTED SCHEME TO AUTHENTICATE DATA STORAGE SECURITY IN CLOUD COMPUTING
ijcsit
 
Conference Paper: Multistage OCDO: Scalable Security Provisioning Optimizatio...
Conference Paper: Multistage OCDO: Scalable Security Provisioning Optimizatio...Conference Paper: Multistage OCDO: Scalable Security Provisioning Optimizatio...
Conference Paper: Multistage OCDO: Scalable Security Provisioning Optimizatio...
Ericsson
 

What's hot (14)

Towards secure cloud data management
Towards secure cloud data managementTowards secure cloud data management
Towards secure cloud data management
 
Security issues associated with big data in cloud
Security issues associated  with big data in cloudSecurity issues associated  with big data in cloud
Security issues associated with big data in cloud
 
Data attribute security and privacy in Collaborative distributed database Pub...
Data attribute security and privacy in Collaborative distributed database Pub...Data attribute security and privacy in Collaborative distributed database Pub...
Data attribute security and privacy in Collaborative distributed database Pub...
 
Information Security in Big Data : Privacy and Data Mining
Information Security in Big Data : Privacy and Data MiningInformation Security in Big Data : Privacy and Data Mining
Information Security in Big Data : Privacy and Data Mining
 
Solve Big Data Security Issues
Solve Big Data Security IssuesSolve Big Data Security Issues
Solve Big Data Security Issues
 
Improved deduplication with keys and chunks in HDFS storage providers
Improved deduplication with keys and chunks in HDFS storage providersImproved deduplication with keys and chunks in HDFS storage providers
Improved deduplication with keys and chunks in HDFS storage providers
 
Adidrds
AdidrdsAdidrds
Adidrds
 
Secure distributed de duplication systems with
Secure distributed de duplication systems withSecure distributed de duplication systems with
Secure distributed de duplication systems with
 
DISTRIBUTED SCHEME TO AUTHENTICATE DATA STORAGE SECURITY IN CLOUD COMPUTING
DISTRIBUTED SCHEME TO AUTHENTICATE DATA STORAGE SECURITY IN CLOUD COMPUTINGDISTRIBUTED SCHEME TO AUTHENTICATE DATA STORAGE SECURITY IN CLOUD COMPUTING
DISTRIBUTED SCHEME TO AUTHENTICATE DATA STORAGE SECURITY IN CLOUD COMPUTING
 
Conference Paper: Multistage OCDO: Scalable Security Provisioning Optimizatio...
Conference Paper: Multistage OCDO: Scalable Security Provisioning Optimizatio...Conference Paper: Multistage OCDO: Scalable Security Provisioning Optimizatio...
Conference Paper: Multistage OCDO: Scalable Security Provisioning Optimizatio...
 
IRJET- An EFficiency and Privacy-Preserving Biometric Identification Scheme i...
IRJET- An EFficiency and Privacy-Preserving Biometric Identification Scheme i...IRJET- An EFficiency and Privacy-Preserving Biometric Identification Scheme i...
IRJET- An EFficiency and Privacy-Preserving Biometric Identification Scheme i...
 
a hybrid cloud approach for secure authorized
a hybrid cloud approach for secure authorizeda hybrid cloud approach for secure authorized
a hybrid cloud approach for secure authorized
 
Big data security_issues_research_paper
Big data security_issues_research_paperBig data security_issues_research_paper
Big data security_issues_research_paper
 
CLOUD COMPUTING -Risks, Countermeasures, Costs and Benefits-
CLOUD COMPUTING -Risks, Countermeasures, Costs and Benefits-CLOUD COMPUTING -Risks, Countermeasures, Costs and Benefits-
CLOUD COMPUTING -Risks, Countermeasures, Costs and Benefits-
 

Viewers also liked (7)

Charles a z
Charles a zCharles a z
Charles a z
 
Alternativas en la nube
Alternativas en la nube Alternativas en la nube
Alternativas en la nube
 
Web 2.0
Web 2.0Web 2.0
Web 2.0
 
Vektorkoordinater
VektorkoordinaterVektorkoordinater
Vektorkoordinater
 
Indexation de photos sociales par propagation sur une hiérarchie de concepts
Indexation de photos sociales par propagation sur une hiérarchie de conceptsIndexation de photos sociales par propagation sur une hiérarchie de concepts
Indexation de photos sociales par propagation sur une hiérarchie de concepts
 
Proyecto De Analisis
Proyecto De AnalisisProyecto De Analisis
Proyecto De Analisis
 
How to... WIKI
How to... WIKIHow to... WIKI
How to... WIKI
 

Similar to Talk Benjamin NGUYEN

Iaetsd enhancement of performance and security in bigdata processing
Iaetsd enhancement of performance and security in bigdata processingIaetsd enhancement of performance and security in bigdata processing
Iaetsd enhancement of performance and security in bigdata processing
Iaetsd Iaetsd
 
iaetsd Using encryption to increase the security of network storage
iaetsd Using encryption to increase the security of network storageiaetsd Using encryption to increase the security of network storage
iaetsd Using encryption to increase the security of network storage
Iaetsd Iaetsd
 

Similar to Talk Benjamin NGUYEN (20)

Iaetsd enhancement of performance and security in bigdata processing
Iaetsd enhancement of performance and security in bigdata processingIaetsd enhancement of performance and security in bigdata processing
Iaetsd enhancement of performance and security in bigdata processing
 
Personal & Trusted cloud
Personal & Trusted cloudPersonal & Trusted cloud
Personal & Trusted cloud
 
Cyber security within Organisations: A sneaky peak of current status, trends,...
Cyber security within Organisations: A sneaky peak of current status, trends,...Cyber security within Organisations: A sneaky peak of current status, trends,...
Cyber security within Organisations: A sneaky peak of current status, trends,...
 
IBM Share Conference 2010, Boston, Ulf Mattsson
IBM Share Conference 2010, Boston, Ulf MattssonIBM Share Conference 2010, Boston, Ulf Mattsson
IBM Share Conference 2010, Boston, Ulf Mattsson
 
Cloud & Big Data - Digital Transformation in Banking
Cloud & Big Data - Digital Transformation in Banking Cloud & Big Data - Digital Transformation in Banking
Cloud & Big Data - Digital Transformation in Banking
 
IRJET- A Novel Framework for Three Level Isolation in Cloud System based ...
IRJET-  	  A Novel Framework for Three Level Isolation in Cloud System based ...IRJET-  	  A Novel Framework for Three Level Isolation in Cloud System based ...
IRJET- A Novel Framework for Three Level Isolation in Cloud System based ...
 
50120140507005 2
50120140507005 250120140507005 2
50120140507005 2
 
50120140507005
5012014050700550120140507005
50120140507005
 
data mining with big data
data mining with big datadata mining with big data
data mining with big data
 
IJARCCE 20
IJARCCE 20IJARCCE 20
IJARCCE 20
 
Cloud computing final show
Cloud computing final   showCloud computing final   show
Cloud computing final show
 
iaetsd Using encryption to increase the security of network storage
iaetsd Using encryption to increase the security of network storageiaetsd Using encryption to increase the security of network storage
iaetsd Using encryption to increase the security of network storage
 
SoleraNetworks
SoleraNetworksSoleraNetworks
SoleraNetworks
 
Big Data & Security Have Collided - What Are You Going to do About It?
Big Data & Security Have Collided - What Are You Going to do About It?Big Data & Security Have Collided - What Are You Going to do About It?
Big Data & Security Have Collided - What Are You Going to do About It?
 
A Trusted TPA Model, to Improve Security & Reliability for Cloud Storage
A Trusted TPA Model, to Improve Security & Reliability for Cloud StorageA Trusted TPA Model, to Improve Security & Reliability for Cloud Storage
A Trusted TPA Model, to Improve Security & Reliability for Cloud Storage
 
Multi- Level Data Security Model for Big Data on Public Cloud: A New Model
Multi- Level Data Security Model for Big Data on Public Cloud: A New ModelMulti- Level Data Security Model for Big Data on Public Cloud: A New Model
Multi- Level Data Security Model for Big Data on Public Cloud: A New Model
 
PyConline AU 2021 - Things might go wrong in a data-intensive application
PyConline AU 2021 - Things might go wrong in a data-intensive applicationPyConline AU 2021 - Things might go wrong in a data-intensive application
PyConline AU 2021 - Things might go wrong in a data-intensive application
 
ISSA Boston - PCI and Beyond: A Cost Effective Approach to Data Protection
ISSA Boston - PCI and Beyond: A Cost Effective Approach to Data ProtectionISSA Boston - PCI and Beyond: A Cost Effective Approach to Data Protection
ISSA Boston - PCI and Beyond: A Cost Effective Approach to Data Protection
 
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT A privacy leakage upper bound constra...
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT A privacy leakage upper bound constra...DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT A privacy leakage upper bound constra...
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT A privacy leakage upper bound constra...
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 

More from INRIA-OAK

Change Management in the Traditional and Semantic Web
Change Management in the Traditional and Semantic WebChange Management in the Traditional and Semantic Web
Change Management in the Traditional and Semantic Web
INRIA-OAK
 
Speeding up information extraction programs: a holistic optimizer and a learn...
Speeding up information extraction programs: a holistic optimizer and a learn...Speeding up information extraction programs: a holistic optimizer and a learn...
Speeding up information extraction programs: a holistic optimizer and a learn...
INRIA-OAK
 
ANGIE in wonderland
ANGIE in wonderlandANGIE in wonderland
ANGIE in wonderland
INRIA-OAK
 
Rdf conjunctive query selectivity estimation
Rdf conjunctive query selectivity estimationRdf conjunctive query selectivity estimation
Rdf conjunctive query selectivity estimation
INRIA-OAK
 

More from INRIA-OAK (20)

Change Management in the Traditional and Semantic Web
Change Management in the Traditional and Semantic WebChange Management in the Traditional and Semantic Web
Change Management in the Traditional and Semantic Web
 
A Network-Aware Approach for Searching As-You-Type in Social Media
A Network-Aware Approach for Searching As-You-Type in Social MediaA Network-Aware Approach for Searching As-You-Type in Social Media
A Network-Aware Approach for Searching As-You-Type in Social Media
 
Speeding up information extraction programs: a holistic optimizer and a learn...
Speeding up information extraction programs: a holistic optimizer and a learn...Speeding up information extraction programs: a holistic optimizer and a learn...
Speeding up information extraction programs: a holistic optimizer and a learn...
 
Querying incomplete data
Querying incomplete dataQuerying incomplete data
Querying incomplete data
 
ANGIE in wonderland
ANGIE in wonderlandANGIE in wonderland
ANGIE in wonderland
 
On building more human query answering systems
On building more human query answering systemsOn building more human query answering systems
On building more human query answering systems
 
Dynamically Optimizing Queries over Large Scale Data Platforms
Dynamically Optimizing Queries over Large Scale Data PlatformsDynamically Optimizing Queries over Large Scale Data Platforms
Dynamically Optimizing Queries over Large Scale Data Platforms
 
Web Data Management in RDF Age
Web Data Management in RDF AgeWeb Data Management in RDF Age
Web Data Management in RDF Age
 
Oak meeting 18/09/2014
Oak meeting 18/09/2014Oak meeting 18/09/2014
Oak meeting 18/09/2014
 
Nautilus
NautilusNautilus
Nautilus
 
Warg
WargWarg
Warg
 
Vip2p
Vip2pVip2p
Vip2p
 
S4
S4S4
S4
 
Rdf saturator
Rdf saturatorRdf saturator
Rdf saturator
 
Rdf generator
Rdf generatorRdf generator
Rdf generator
 
Rdf conjunctive query selectivity estimation
Rdf conjunctive query selectivity estimationRdf conjunctive query selectivity estimation
Rdf conjunctive query selectivity estimation
 
rdf query reformulation
rdf query reformulationrdf query reformulation
rdf query reformulation
 
postgres loader
postgres loaderpostgres loader
postgres loader
 
Plreuse
PlreusePlreuse
Plreuse
 
Paxquery
PaxqueryPaxquery
Paxquery
 

Recently uploaded

Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Sérgio Sacani
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
RizalinePalanog2
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Lokesh Kothari
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
Sérgio Sacani
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
RohitNehra6
 

Recently uploaded (20)

Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Creating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsCreating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening Designs
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 

Talk Benjamin NGUYEN

  • 1. PR SM PRiSM Lab. - UMR 8144 Privacy Preserving SQL Query Execution on Distributed Data Quoc-Cuong To, Benjamin Nguyen, Philippe Pucheral SMIS Project LaHDAK Seminar Orsay 4th March 2014 Université de Versailles et St-Quentin INRIA Rocquencourt CNRS
  • 2. PR SMPRiSM Lab. - UMR 8144 PART I The New Oil I. The New Oil II. Trusted Cells III. Global SQL Queries IV. Cost Model and Experiments V. Conclusion
  • 3. PR 3 Mass-generation of (personal) data Data sources have mostly turned digital Analog processes e.g., photography, films Paper-based interactions e.g., banking, e-administration Communications e.g., email, SMS, MMS, Skype Where is your personal data? … In data centers 112 new emails per day  Mail servers 65 SMS sent per day  Telcos 800 pages of social data  Social networks Web searches, list of purchases  google, amazon People recording People listnening St Peter's Place, Roma WHY ?Is this a problem ? / 41 Everything is free…  Information Extraction
  • 4. PR 4 Personal data is the new oil Is this good news ?  $2 billion a year spend by US companies on third-party data about individuals (Forrester Report)  $44.25 is the estimated return on $1 invested in email marketing (oil is up to 0.5$/yr) High Market Value Companies  Facebook: value / #accounts ≈ 50$  Google: $38 billion business sells ads based on how people search the Web  Amazon (knows purchase intent), mail order systems companies (gmail), loyalty programs (supermarkets), banks & insurrance, employement market (linkedIn, viadeo), travel & transportation (voyages-sncf), the « love » market (meetic), etc. /41
  • 5. PR 5 Personal data is the new oil How would oil companies behave ? • Exploit your oil field for free  Know all about you • Offer “extra” services  Refine their knowledge • Provide real services to their paying customers (e.g. advertisement and profiling, location tracking and spying, …) In other words : your personal data would be processed by sophisticated data refineries… REGARDLESS OF YOUR PRIVACY ! It’s the business model… … or bad news ? /41 Their choice Your choice
  • 6. PR 6 Is the current centralised model good wrt privacy protection? Intrinsic problem #1: personal data is exposed to sophisticated attacks –High benefits to successful hack –One person negligence may affect millions Intrinsic problem #2: personal data is hostage of sudden privacy changes –Centralised administration of data means delegation of control –This leads to regular changes, with application (and business) evolution, with mergers and acquisition, etc. (e.g Facebook 2012) Increasing security is only a partial solution since does not solve those intrinsic limitations E.g., TrustedDB [BS12] proposes tamper-resistant hardware to secure outsourced centralized databases. /41
  • 7. PR 7 A New Hope A Personal Data Ecosystem… … built around user-centricity and trust, achieved through a decentralized architecture 7 THE TRUSTED CELL ! I want my privacy back !! /41 Our goals :  Preserve current USER functionalities  Hinder uncontrolled data exploitation & privacy violations Our targets :  General Data Management Applications : SQL  “Low cost” solutions (i.e. acceptable by general public)
  • 8. PR SMPRiSM Lab. - UMR 8144 PART II Trusted Cells I. The New Oil II. Trusted Cells III. Global SQL Queries IV. Cost Model and Experiments V. Conclusion
  • 9. PR 9 The Secure (Trusted) Personal Data Server Approach [AAB+10] 9 Personal database is • Well-organized • Tamper resistant • Controlled by the owner (sharing, retention, audit) • Accessible in disconnected mode Personal data Hospital Doctor’ s office My bank My employer My telco Private application Secure multi-actors application Secure global queries External application e.g., epidemiological study e.g., medical care Doctor Nurse e.g., budget optimization e.g., financial help Bob Approach characteristics : • Based on tamper-resistant HW • Well Structured World (R-DB, limited apps) • Uniform equipment TRUSTED CELL FLASH (GB size) RAM FLASH NOR CPU Crypto PDS generic code Relational DBMS Operating System Certified Encrypted Personal Database /41
  • 10. PR 10 Why trust personal secure HW solutions? 1. Users store their own data  minimize abusive usage 2. Auto-administered platform  no DBA attack (even by user) 3. Enforce privacy principles for externalized (shared) data  best if the recipient of the data is another TC 4. Tamper-resistance + certified code/secure execution + single user + physical access needed  ratio cost/benefit of an attack is very high Tamperresistance Gemalto secure token SMIS token (ZED) Trust Zone architecture Dedicated HW device PC ? (social trust / open source) /41
  • 11. PR 11 The Trusted Cell Asymmetric Architecture Durability, Availability Secure Computation Export Data 11 TC asymmetric architecture Built using Secure Portable Tokens as Trusted Cells (called here Trusted Data Server or TDS) / Cloud as Supporting Server Infrastructure (SSI). Challenges : Local (Embedded) data management (not my work : Anciaux, Bouganim, Pucheral et al.) Global querying (Part III) Data export management (MinExp Project with CG78 & LIX) HIGH POWER / AVAILABILITY LOW / NO TRUST LOW POWER / AVAILABILITY HIGH TRUST ASYMMETRIC Encrypted Private Data Generated (e.g. sensor) /41
  • 12. PR SMPRiSM Lab. - UMR 8144 PART III Global SQL Queries on the Asymmetric Architecture I. The New Oil II. Trusted Cells III. Global SQL Queries IV. Cost Model and Experiments V. Conclusion
  • 13. PR 13 Example Trusted Cell : a Trusted Data Server (TDS) Token Characteristics : • High security: • High ratio Cost/Benefit of an attack; • Secure against its owner; • Modest computing resources (~10Kb of RAM, 50MHz CPU); • Low availability: physically controlled by its owner; connects and disconnects at it will 13 How to compute global queries over decentralized personal data stores while respecting users’ privacy? Authorized Querier Average Salary in Orsay Unauthorized Querier
  • 14. PR 14 TC can be : Unbreakable (honest) Broken (Weakly Malicious) nfrastructure (SSI) can be : Honest but curious (Semi-honest) Weakly-Malicious (Covert Adversary = does not want to be detected) Secure Global Computation on TCs PROBLEM : How to perform global queries on the asymmetric architecture? (i.e. using data from many/all cells) The « classical » problem of Secure Global Computation (e.g SMC) is more general and makes no trust assumption. THREAT MODEL : /41 HBC + Unbreakable  “simple protocols” presented here (EDBT’14 [TNP14]) WM + Broken  Must be prevented ! (via security primitives) see [ANP13]
  • 15. PR 15 Is this a new problem ? Several approaches are possible to securely perform global computations: 1. Use only an untrusted server/cloud/P2P and use generic (and costly) algorithms. (e.g. Secure Multi-Party Computing [Yao82, GMW87, CKL06], fully homomorphic encryption [Gent09]) Problem = COST 2. Use only an untrusted server/cloud/P2P and develop a specific algorithm for each specific class of queries or applications. (e.g. DataMining Toolkit [CKV+02]) Problem = GENERICITY 3. Introduce a tangible element of trust, through the use of a trusted component and develop a generic methodology to execute any centralized algorithm in this context. ([Katz07, GIS+10, AAB+10])  Problem = TRUST /41
  • 16. PR 16 Hypothesis on Querier and SSI Querier: • Shares the secret key with TDSs (for encrypt the query & decrypt result). • Classical Access control policy (e.g. RBAC): – Cannot get the raw data stored in TDSs (get only the final result) – Can obtain only authorized views of the dataset ( do not care about inferential attacks) Supporting Server Infrastructure: • Doesn’t know query (so, attributes in GROUP BY clause) b/c query is encrypted by Querier before sending to SSI. • Has prior knowledge about data distribution. • Honest-but-curious attacker: Frequency-based attack – SSI matches the plaintext and ciphertext of the same frequency. e.g. investigates remarkable (very high/low) frequencies in dataset distribution (e.g., X is the only person with a given (high) age and still working and earning money → if I find a group with only one member I can deduct that X participates in the dataset). 16
  • 17. PR 17 Solution Overview 17 1) Query Supporting Server Infrastructure (SSI) … SELECT <attribute(s) and/or aggregate function(s)> FROM <Table(s) / SPTs> [WHERE <condition(s)>] [GROUP BY <grouping attribute(s)>] [HAVING <grouping condition(s)>] [SIZE <size condition(s)>]; 2) Collection and Filtering phase 3) Aggregation phase Stop condition: max #tuples or max time John, 35K Mary, 43K Paul, 100K SELECT age, AVG(salary) FROM user WHERE town = “Orsay” GROUP BY age HAVING MIN(salary) > 0 SIZE 4) Aggregate Filtering phase
  • 18. PR 18 Proposed Solutions The main difficulty is with AGGREGATE QUERIES !! Solutions vary depending on which kind of encryption is used, how the SSI constructs the partitions, and what information is revealed to the SSI. • Secure aggregation solution • Noise-based solutions – random (white) noise – noise controlled by the complementary domain • Histogram-based solutions We investigate these solutions along the directions of performance and security. 18
  • 19. PR 19 Secure Aggregation 19 Supporting Server Infrastructure (SSI) … encrypts its data using non-deterministic encryption Form partitions (fit resource of a TDS) Hold partial aggregation (Gij,AGGk) Querier } (25y, Orsay, 35K) (#x3Z, aW4r) (45y, Orsay, 43K) (53y, Paris, 100K) Q: SELECT Age, AVG(Salary) WHERE city = Orsay GROUP BY Age HAVING Min(Salary) > 0 ($f2&, bG?3) No answer ? (#x3Z, aW4r) ($f2&, bG?3) ($&1z, kHa3) … (T?f2, s5@a) (#i3Z, afWE) (T?f2, s!@a) ($f2&, bGa3) (#x3Z, aW4r) ($f2&, bG?3) ($&1z, kHa3) (?i6Z, af~E) (T?f2, s5@a) (5f2A, bG!3) (25, 35K) (45, 43K) (45, 37K) (25, [35K,1]) (45, [40K,2]) (F!d2, s7@z) (ZL5=, w2^Z) Final Agg (#f4R, bZ_a) (Ye”H, fw %g) (@!fg, wZ4#) (25, 29.5K) (45, 43.7K) … Evaluate HAVING clause Final Result (#f4R, bZ_a) (Ye”H, fw%g) Qi= <EK1(Q),Cred,Size> Decrypt Qi Check AC rules Decrypt Qi Check AC rules Decrypt Qi Check AC rules
  • 20. PR 20 Noise Based Protocols Secure Aggregation Efficiency problem : nDet_Enc on AG  SSI cannot gather tuples belonging to the same group into same partition. But : Det_Enc on AG  frequency-based attack. Idea : Add noise (fake tuples) to hide distribution of AG. How many fake tuples (nf) needed?  disparity in frequencies among AG – small nf: random noise – big nf: white noise – nf = n-1: controlled noise (n: AG domain cardinality) Efficiency: – Each TDS handles tuples belonging to one group (instead of large partial aggregation as in SAgg) – However, high cost of generating and processing the very large number of fake tuples
  • 21. PR 21 Nearly Equi-Depth Histogram Solution 1. Distribution of AG is discovered and distributed to all TDSs. 2. TDS allocates its tuple to corresponding bucket. 3. TDS send to SSI: {h(bucketId),nDet_Enc(tuple)} Consequences : 21  We do not generate & process too many fake tuples  We do not handle too large partial aggregation True Distribution Nearly equi-depth histogram Problem : Distribution must be discovered  This can be done “offline” using secure aggregation !
  • 22. PR 22 Information Exposure Analysis (DCJP+03) 22 To measure Information Exposure, we consider the probability that an attacker (here the Honnest but Curious SSI) can reconstruct the plaintext table (or part of the table) using the encrypted table and his prior knowledge about global distributions of plaintext attributes. Information Exposure is noted : • n is the number of tuples • k is the number of attributes • ICi,j is the value in row i and column j of the inverse cardinality ( = 1/number of plaintext values that could correspond) • Nj is the number of distinct plaintext values in the global distribution of attribute in column j (i.e., Nj ≤ n). , 1 1 1 kn i j i j IC n ε = = = ∑∏
  • 23. PR 23 23 _ 1 1 1 1 1 1/ k kn S Agg j i j jj N n N ε = = = = =∑∏ ∏SAgg: ICi,j = 1/Nj for all i,j •n: the number of tuples, •k: the number of attributes, •ICi,j : IC for row i and column j •Nj: the number of distinct plaintext values in the global distribution of attribute in column j (i.e., Nj ≤ n) _ 1 min( ) 1/ k ED Hist j j Nε = = ∏ EDHist: requires finding all possible partitions of the plaintext values such that the sum of their occurrences is the cardinality of the hashed value: NP-Hard multiple subset sum problem Noise_based & ED_Hist have a uniform distribution of the AG: ɛED_Hist = ɛNoise_based Plaintext: _ 1 1 1 1 1 kn P Text i jn ε = = = =∑∏ ɛS_Agg ≤ ɛED_Hist =ɛNoise_based <1 Information Exposure Analysis (Damiani et al. CCS 2003)
  • 24. PR SMPRiSM Lab. - UMR 8144 PART IV Cost Model and experiments I. The New Oil II. Trusted Cells III. Global SQL Queries IV. Cost Model and Experiments V. Conclusion
  • 25. PR 25 Unit Test Calibration 25 Internal time consumption Eval Board •32 bit RISC CPU: 120 MHz •Crypto-coprocessor: AES, SHA •64KB RAM, 1GB NAND-Flash •USB full speed: 12 Mbps } SMIS developped token (ZED electronics) Same technical characteristics Price = 50 EUR (small series)
  • 26. PR 26 Parameters for cost model Dataset size Ttuple : varies from 5 to 65 million Number of groups G : varies from 1 to 106 Number of TDSs participating in the computation as a percentage of all TDSs connected at a given time Ttds : varies from 1% to 100%). We fix two parameters and vary the other, measuring : execution time, parallelism of the protocol, total load, maximum load on one TDS When the parameters are fixed : Ttuple =106 , G=103 , % of TDS connected = 10% of Ttuple. We also compute and use the optimal value for all reduction factors as well as for. In the figures, we plot two curves for Rnf_Noise protocols RN (nf = 2) and WN (nf = 1000) to capture the impact of the ratio of fake tuples.
  • 27. PR 27 EXECUTION TIME 27 Ttuple=106 ; G=1-106 Ttuple=5.106 - 35.106 ; G=1000 Naïve, noise-based, ED&EW: •G increases, Ttuple fixed  Number of tuples in each group decreases •Depend only on the total number of tuples in each group (because all groups are processed in parallel)  exeTime decreases when G increases. Secure Count: •G increases  time for processing the big partial aggregation increases accordingly. •Cannot fully deploy the parallel computation (cannot divide each group for TDSs in parallel, each TDS has to handle the whole G groups)  exeTime increases Naïve, RN, ED&EW: •Ttuple increases, Ttds increases accordingly  not much changes Secure Count: • Number of recursive steps increases when Ttuple increases.  exeTime increase WN,CN: • Number of fake tuples increases linearly with the number of true tuples.  exeTime also increases linearly to handle the fake & true tuples
  • 28. PR 28 NUMBER OF PARTICIPATING TDSS 28 Ttuple=106 ; G=1-106 Ttuple=5.106 - 35.106 ; G=1000 Secure Count: •G increases  level of convergence is low & the size of each aggregation is big  need less participating TDSs to build the aggregations to gain the high convergence level Other solutions: • Since each group is processed in parallel and independently  when G increases, the level of parallelism increases  more TDSs are needed to participate in the parallel computation WN, CN: • When true Ttuple increases, the fake tuples increases as well  more TDSs are needed to process fake tuples Secure Count: • Level of parallelism is less than other solutions  needs least TDS
  • 29. PR 29 TOTAL LOAD (NETWORK OVERHEAD) 29 Ttuple=106 ; G=1-106 Ttuple=5.106 - 35.106 ; G=1000 Noised-based: • Highest load because of the fake tuples • When G increases but Tpds does not change number of tuples (both true and fake) do not change total load is the same Others: Lower load since handle only true data Noised-based: • When true Ttuple increases, the fake tuples increases linearly total load is highest and increases
  • 30. PR 30 MAXIMUM LOAD 30 Ttuple=106 ; G=1-106 Ttuple=5.106 - 35.106 ; G=1000 Secure Count: •When G increases, size of each aggregation is big each PDS process bigger aggregation •When G increases, number of participating PDSs decrease  each participating PDS incurs higher load Others: •When G increases, number of participating PDSs decrease & number of tuples in each group decreases each PDS process less tuples  maxLoad decrease WN, CN: •Use all available PDSs  maxLoad increases linearly when Ttuple increases Others: when Ttuple increases, the number of participating PDSs also increase accordingly  in general, the maxLoad does not increase too much
  • 31. PR 31 AVERAGE LOAD 31 Ttuple=106 ; G=1-106 Ttuple=5.106 - 35.106 ; G=1000 Secure Count: •Total load is unchanged but the number of participating TDSs is reduced when G increases  the average load increases. WN,CN: •High total load is the same & all PTpds=10^5 participate in the computation  every PDSs incur the same amount of load Others: •G increase, more participating PDSs & total load unchanged  AvgLoad decreases Although: TotalLoad(CN) > TotalLoad(SC) PTpds(CN) >> PTpds(SC)  AvgLoad(CN) < AvgLoad(SC)
  • 32. PR 32 CONSUMED MEMORY 32 Actual RAM size of TDS Noise-based: •Need to store only 1 group regardless of G  Require least RAM. Histogram-based: •Each PDS store h groups (h>1) regardless of G  Require higher RAM SC: •Each PDS store all G groups •When G increases, RAM needed increases  Require highest RAM •Exceed actual RAM’s size  future work
  • 33. PR 33 AVERAGE TIME FOR PDS TO CONNECT 33 Ttuple=106 ; G=1-106 Ttuple=5.106 - 35.106 ; G=1000 Secure Count: •The number of participating PDSs is reduced when G increases  the average time increases. WN,CN: •High total load is unchanged & all PTpds=10^5 participate in the computation  every PDSs take the same amount of time to process data Others: •G increase, more participating PDSs  AvgTime decreases High AvgTime: •WN,CN: because of too many fake tuples •SC: because of very few participating PDSs
  • 34. PR 34 Theoretical Scalability 34 Tpds = 1%Ttuple Tpds = 10%Ttuple Tpds = 100%Ttuple Secure Count: has a (low) maximum number of participants. Others: WN have higher scalability than others (in the sense that adding participants count)
  • 36. PR 36 COMPARISON WITH OTHER STATE-OF-THE-ART METHODS 36 Hardware: •Linux workstation; •AMD Athlon-64 2Ghz processor; •512 MB memory •SC: depends mostly on G (slightly on Ttuple) •Others: not depends on G, but mostly on Ttuple Answering aggregation queries in a secure system model. (Ge & Zdonic, VLDB 2007) DES: each value is decrypted and the computation is performed on the plaintext. Server must have access to secret key & plaintext (violates security requirements) Paillier: perform computation directly on the ciphertext using a secure homomorphic encryption scheme: enc(a + b) = enc(a) + enc(b) Server performs computation without having access to the secret key or plaintext. In the end, ciphertext are passed back to the trusted agent (i.e., Key Holder) to perform a final decryption and simple calculation of the final result
  • 37. PR 37 Metrics for the evaluation of the proposed solutions 37 Total Load Average Time/Load Query Response Time Information Exposure Throughput Resource Variation
  • 38. PR 38 Trade-off between criteria 38 Select .. From .. Where .. Group By AG G = card (AG) Security: S_Agg > ED_Hist Performance: G > 10: ED_Hist > S_Agg G <= 10: ED_Hist < S_Agg
  • 39. PR SMPRiSM Lab. - UMR 8144 PART V Conclusion and perspectives I. The New Oil II. Trusted Cells III. Global SQL Queries IV. Cost Model and Experiments V. Conclusion
  • 40. PR 40 Short/Middle term research : Data intensive Computing on an Asymmetric Architecture SQL Queries here do not have joins ! Take into account Malicious SSI / Broken Tokens Field experiment on usability (with ISN) Private/Secure MapReduce Investigate compatibility of our protocols. Develop new protocols. Check performance ! XML management Adapt the work on XQ2P (Butnaru, Gardarin, Nguyen) to the Trusted Cells context. Distributed Window Queries. /41
  • 41. PR 41 Promoting the Trusted Cells vision Trusted Cells “Core” Open hardware and software bundle : basic functionalities Local DB Distributed DB NoSQL DB  needed to develop PbD personal data management applications ! Promote an open source community around Trusted Cells. UVSQ FabLab Bring secure data management to the Versailles FabLab Beyond Tamper Resistant HW Results are useable even with lower trust elements. Include social trust / reputation. /41
  • 42. PR SMPRiSM Lab. - UMR 8144 QUESTIONS ? 42
  • 43. PR SMPRiSM Lab. - UMR 8144 43
  • 44. PR 44

Editor's Notes

  1. 27:35