SlideShare a Scribd company logo
Towards Statistical Queries over
Distributed Private User Data
R.Chen, A.Reznichenko, P.Francis – MPI-SWS, Germany
J.Gehrke – Cornell University, USA
Serafeim Chatzopoulos
M1258
schatz@di.uoa.gr
MDE519 – Distributed Systems
Instructor: Mema Roussopoulou
May 31,
2013
User Privacy
Towards Statistical Queries over Distributed Private User Data 2
 User Data is exposed to organizations in many
ways.
 Users are aware of their data being exposed.
 Make a purchase in an online store.
 Update a profile on a social network.
 Users are unaware of their data exposure.
 Third party trackers.
 Smart phone Apps.
The “user-owned and operated” principle
Towards Statistical Queries over Distributed Private User Data 3
 Personal data should be stored in a local host or a
cloud device under the user‟s control and is released
in a controlled, limited or noisy fashion.
Users must have the exclusive control of
their own data and must be able to share
data selectively or voluntarily.
Motivation and Problem
Towards Statistical Queries over Distributed Private User Data 4
 Distributed private user data is important.
 Analyst could use such data to
 understand users‟ behaviors
 discover their statistic patterns
 evaluate proposed enhancements.
 How to make statistical queries over such distributed
private user data while still preserving privacy?
Related Work
Towards Statistical Queries over Distributed Private User Data 5
 Anonymization
Removes well-known personally identifiable
information(PPI).
 Randomization
Adds random distortion values to user data.
 k-anonymity, l-diversity, t-closeness
 Differential Privacy
Differential Privacy
Towards Statistical Queries over Distributed Private User Data 6
 Differential privacy adds noise to the output of a
computation (i.e., answer of query).
 Hides the presence or absence of a record in the
dataset.
 Makes no assumption about the adversary.
Some form of distributed differential privacy is
required…
Prior Distributed Differential Privacy Designs
Towards Statistical Queries over Distributed Private User Data 7
 First design has a per-user computational load of
O(U).
Dwork et al. EUROCRYPT ‟06
 Poor scalability
 Following designs reduce per-user computational
load to O(1) by using expensive secret sharing
protocols.
Rastogi and Nath, SIGMOD ‟10 – Shi et al. NDSS ‟11
 Not tolerate churn
 Recent designs introduce two honest-but-curious
servers to collaboratively compute the query result.
Gotz and Nath, MSR-TR ‟11
 Even a single malicious user can substantially distort
the query result.
Practical Distributed Differential Privacy System
(PDDP)
Towards Statistical Queries over Distributed Private User Data 8
 Goals:
 The differential private guarantee is always maintained for
every honest client.
 Puts tight bound to the extent to which a malicious user
can distort query results.
 The maximum absolute distortion in the final result is bounded
by the number of malicious users.
 Operates at a large scale.
 Millions of users.
 Tolerates churn.
 Not prevent results from being produced.
PDDP Components
Towards Statistical Queries over Distributed Private User Data 9
 Analyst
 Makes queries to the system
and collects answers.
 Proxy
 Adds differential private noise
to client‟s answers to preserve
privacy
 Clients
 Locally maintain their own data
and answer queries.
Security Assumptions (1/2)
Towards Statistical Queries over Distributed Private User Data 10
 General Assumptions
 Clients have the correct public keys for analyst and the
proxy.
 Analyst and the proxy have the correct public keys for
each other.
 Corresponding private keys are kept secure.
 Analyst is potentially malicious (violating users‟
privacy)
 Collude with other analysts.
 Pretend to be multiple distinct analysts.
 Take control of clients and use PDDP protocol to reveal
info.
 Publish its collected answers.
 Intercept and modify all messages.
Security Assumptions (2/2)
Towards Statistical Queries over Distributed Private User Data 11
 Proxy is honest but curious (HbC)
 Follows the specified protocol.
 Tries to exploit additional info that can be learned in so
doing.
 Does not collude with other components.
 Clients are potentially malicious (distorting the
statistical results learned by analysts)
 Have churn characteristics.
 Limited resources for computation and data transmission.
 Generate false or illegitimate answers.
 Act as Sybils.
PDDP Key insights – Binary answer
Towards Statistical Queries over Distributed Private User Data 12
 How to limit query result distortion?
 Split answer‟s value into buckets.
 Enforce a binary answer in each bucket.
 Goldwasser-Micali (GM) bit-cryptosystem.
Example:
Query: “SELECT age FROM info WHERE gender=„m‟”
 4 buckets: 0~12, 13~20, 21~59, and ≥60.
 Answers: „1‟ or „0‟ per bucket
 Malicious clients cannot substantially distort the query
result.
PDDP Key insights – Blind noise
Towards Statistical Queries over Distributed Private User Data 13
 How to achieve differential privacy ?
 Honest-but-curious proxy
Generates additional binary answers in each bucket as
differentially private noise.
 If analyst publishes the final noisy result
 proxy knows the noise added
 can subtract noise from the publish result to get a noisy-free
result.
 Solution: Proxy can only blindly add noise!
 Proxy knows that the added noise is enough to achieve
differential privacy
 Proxy does not know the exact noise added.
PDDP Workflow – Step 1
Towards Statistical Queries over Distributed Private User Data 14
 Query Initialization
Analyst first issues
a query to the
Proxy.
 Message consists of 4 items:
 Query: SELECT age FROM info WHERE gender=„m‟
 Buckets: 0∼12, 13∼20, 21∼59 and ≥60.
 # clients queried (c): 1000
 DP parameter (ε): 1.0
 Controls tradeoff between accuracy of computation and strength of
its privacy guarantee.
PDDP Workflow – Step 2
Towards Statistical Queries over Distributed Private User Data 15
 Query Forwarding
Select clients and
send them the
query.
 Proxy:
 rejects the query if c is too low or too high.
 rejects the query if ε exceeds the max privacy level allowed.
 selects c unique clients and send them the query, under the one
of the following policies:
 Select c clients randomly and wait for them to connect.
 Select the first c clients that connect.
PDDP Workflow – Step 3 (1/2)
Towards Statistical Queries over Distributed Private User Data 16
 Client Response
Clients execute
the query and
send answers.
 Client executes query over its local data and produces
answer:
 „1‟ or „0‟ per bucket.
 More than one bucket may contain a „1‟.
 Per-bucket answer value is individually encrypted with the
analyst‟s public key. (GM cryptosystem)
PDDP Workflow – Step 3 (2/2)
Towards Statistical Queries over Distributed Private User Data 17
 Goldwasser-Micali (GM) cryptosystem
 Single-bit cryptosystem
 Enforces binary answer in each bucket.
 Very Efficient
 XOR – homomorphic
 E(a) * E(b) = E(a XOR b)
PDDP Workflow – Step 4
Towards Statistical Queries over Distributed Private User Data 18
 Blind noise
addition
 The proxy maintains a pool of additional binary
answers called coins and adds them as noise to
each bucket.
 Coins must be unbiased.
 Coins are encrypted with the analyst‟s public key.
 In each bucket must be added n coins:
How to generate coins blindly?
Coin pool generation
Towards Statistical Queries over Distributed Private User Data 19
 Straightforward approaches
 Proxy generates coins
 Curious proxy could know noise-free result
 Clients generate coins
 Malicious clients could generate biased coins
Collaborative coin generation
Towards Statistical Queries over Distributed Private User Data 20
 Paper‟s approach
 Each online client periodically generates an encrypted
unbiased coin E(oc) and sends it to the proxy
 The proxy receives the coin and verifies the legitimacy of the
coin.
 The proxy blindly re-flips the coin E(oc) by multiplying it with a
proxy‟s locally generated unbiased coin E(op) plus a modulo
operation.
E(oc) * E(op) mod m = E(oc XOR op),
where m is part of the analyst’s public key
 The proxy stores the unbiased coin in the locally maintained
pool.
 Proxy doesn‟t know the actual value of the generated unbiased
coin.
PDDP Workflow – Step 5
Towards Statistical Queries over Distributed Private User Data 21
 Noisy answers to
analyst
 Each bucket has clients answers + coins (noise)
 After random delay the proxy shuffles the c + n values.
 Prevents identification of a client based on the vector of „1‟ and „0‟ in its answer.
 Finally, analyst
 decrypts with its private key all encrypted binary values.
 sums the plaintext values obtained.
 obtains the noisy answer for the clients that fall within each bucket.
Practical Considerations (1/2)
Towards Statistical Queries over Distributed Private User Data 22
 Utility of aggregate result
 Depends on the amount of added noise.
 The n coins added by the proxy and the analyst‟s adjustment on
the means of n/2 form a binomial distribution (approximation of
the normal distribution N(0, n/4) ).
 Example :
c =106 , ε = 1.0
Given normal distribution in each bucket
 68% probability that the noisy answer is 15.24 away from the true answer
 95% probability that the noisy answer is 30.48 away from the true answer
 99.7% probability that the noisy answer is 45.72 away from the true answer
Practical Considerations (2/2)
Towards Statistical Queries over Distributed Private User Data 23
 Non-numeric Queries
 Map query into a numeric query.
Example:
“Which website do you visit most often?”
Map each website the analyst wishes to learn into a numeric
value.
Large number of buckets – limit the answer to 5000 buckets.
 Sybils
 Design susceptible to Sybil attacks (single client can
masquerades multiple clients).
 Proxy can limit the number of clients selected at a single IP
address for a given query.
Implementation and Deployment (1/2)
Towards Statistical Queries over Distributed Private User Data 24
 Client
 Firefox add-on
 9600 lines of Java code
 Information is stored in local SQLite storage
 Web browsing activities
 Certain online shopping activities
 Certain ad interactions
 Can be extended to capture any online activity
 Every 5 min connects to the proxy to retrieve pending queries,
return answers and periodically generated coins.
Implementation and Deployment (2/2)
Towards Statistical Queries over Distributed Private User Data 25
 Proxy
 Web service on Tomcat 6.0.33
 3600 lines of code
 Proxy state in MySQL database.
 Analyst
 800 lines of code
Deployment
 Correctness verified on a set of local machines.
 600+ real clients
Comparison: “Paillier-based” design
Towards Statistical Queries over Distributed Private User Data 26
 Honest-but-Curious Proxy
 Paillier Cryptosystem
 Additive homomorphism
 Proxy can directly sum up all clients‟ encrypted binary
answers to get the encrypted sum of each bucket.
 A single malicious client can distort substantially the
result
 Use of zero-knowledge-proofs (ZKP) to ensure that
encrypted answers are „1‟ or „0‟.
 Proxy knows exactly how much noise has been
added.
Evaluation (1/5)
Towards Statistical Queries over Distributed Private User Data 27
 Client Performance
 Clients encrypt a binary value for each bucket.
 GM cryptosystem
 Paillier cryptosystem
Evaluation (2/5)
Towards Statistical Queries over Distributed Private User Data 28
 Proxy - Analyst Performance
 Proxy
PDDP
 One encryption and one homomorphic XOR for one unbiased coin.
 Jacobi symbol checking on received coins and answer values
(faster than a decryption).
Paillier-based
 One ZKP for each client answer in each bucket.
 Homomorphically sum up all clients answers per bucket.
 Add noise to each per-bucket total sum.
Evaluation (3/5)
Towards Statistical Queries over Distributed Private User Data 29
 Proxy - Analyst Performance
 Analyst
PDDP
 Decrypt all encrypted values in each bucket.
Paillier-based
 Decrypt one encrypted value in each bucket
Evaluation (4/5)
Towards Statistical Queries over Distributed Private User Data 30
 Bandwidth overhead
 In both systems, a client transmits an encrypted answer to
each bucket.
 In PDDP, a client transmits periodically generated coin to the
proxy.
 In Paillier-based, a client transmits a ZKP for each bucket.
 Storage overhead
 In PDDP, the proxy stores all clients‟ answer values for each
bucket plus the required number of coins.
 In Paillier-based, proxy stores only one answer value per
bucket.
Evaluation (5/5)
Towards Statistical Queries over Distributed Private User Data 31
 Querying the client deployment
 Parameters
c = 250 (out of 600 clients)
ε = 5.0
clients are selected as they connect until 250 unique clients are queried or
24-hours expire.
These parameters result in 16 coins per bucket.
 Ensure that a per bucket aggregate answer is within plus or minus 2, 4, 6
of the noisy-free answer with a probability of 68%, 95% and 99,7%
Future Work
Towards Statistical Queries over Distributed Private User Data 32
 Support of statistical learning algorithms
 Scalability of non-numeric queries
 Bloom filters – map a large number of possible answers in
a small number of buckets.
 Gather statistical data for a large-scale experiment.
 Weaken proxy trust requirements.
 Use of trusted hardware (TPM)
 General: measure the actual privacy loss for
differential privacy.
Conclusion
Towards Statistical Queries over Distributed Private User Data 33
 PDDP: Practical Distributed Differential Private
System
 Scales well
 Tolerates churn
 Places tight bound on malicious user‟s capability.
 Key insights
 Binary answer in each bucket
 Blind noise addition
Towards Statistical Queries over Distributed Private User Data 34
Questions?

More Related Content

What's hot

Cis 333 Education Organization / snaptutorial.com
Cis 333   Education Organization / snaptutorial.comCis 333   Education Organization / snaptutorial.com
Cis 333 Education Organization / snaptutorial.com
Baileya82
 
Comprehensive Study of Counter-acting Security Threats in Mobile Ad Hoc Networks
Comprehensive Study of Counter-acting Security Threats in Mobile Ad Hoc NetworksComprehensive Study of Counter-acting Security Threats in Mobile Ad Hoc Networks
Comprehensive Study of Counter-acting Security Threats in Mobile Ad Hoc Networks
drsrinivasanvenkataramani
 
International Journal of Wireless & Mobile Networks (IJWMN)
International Journal of Wireless & Mobile Networks (IJWMN) International Journal of Wireless & Mobile Networks (IJWMN)
International Journal of Wireless & Mobile Networks (IJWMN)
ijwmn
 
A Reliable Peer-to-Peer Platform for Adding New Node Using Trust Based Model
A Reliable Peer-to-Peer Platform for Adding New Node Using Trust Based Model    A Reliable Peer-to-Peer Platform for Adding New Node Using Trust Based Model
A Reliable Peer-to-Peer Platform for Adding New Node Using Trust Based Model
IJECEIAES
 
An Efficient Secured And Inspection of Malicious Node Using Double Encryption...
An Efficient Secured And Inspection of Malicious Node Using Double Encryption...An Efficient Secured And Inspection of Malicious Node Using Double Encryption...
An Efficient Secured And Inspection of Malicious Node Using Double Encryption...
IRJET Journal
 
Whitepaper- User Behavior-Based Anomaly Detection for Cyber Network Security
Whitepaper- User Behavior-Based Anomaly Detection for Cyber Network SecurityWhitepaper- User Behavior-Based Anomaly Detection for Cyber Network Security
Whitepaper- User Behavior-Based Anomaly Detection for Cyber Network Security
Happiest Minds Technologies
 
Classification of Malware Attacks Using Machine Learning In Decision Tree
Classification of Malware Attacks Using Machine Learning In Decision TreeClassification of Malware Attacks Using Machine Learning In Decision Tree
Classification of Malware Attacks Using Machine Learning In Decision Tree
CSCJournals
 
Building AI with Security and Privacy in mind
Building AI with Security and Privacy in mindBuilding AI with Security and Privacy in mind
Building AI with Security and Privacy in mind
geetachauhan
 
Securing Personal Information in Data Mining
Securing Personal Information in Data MiningSecuring Personal Information in Data Mining
Securing Personal Information in Data Mining
IJMER
 
Survey of Security Threats and Protection Techniques in Mobile Ad Hoc Networks
Survey of Security Threats and Protection Techniques in Mobile Ad Hoc NetworksSurvey of Security Threats and Protection Techniques in Mobile Ad Hoc Networks
Survey of Security Threats and Protection Techniques in Mobile Ad Hoc Networks
drsrinivasanvenkataramani
 
Do s and d dos attacks at osi layers
Do s and d dos attacks at osi layersDo s and d dos attacks at osi layers
Do s and d dos attacks at osi layers
Hadeel Sadiq Obaid
 
An efficient distributed trust model for wireless sensor networks
An efficient distributed trust model for wireless sensor networksAn efficient distributed trust model for wireless sensor networks
An efficient distributed trust model for wireless sensor networks
IISTech2015
 
Individual Project - Final Report
Individual Project - Final ReportIndividual Project - Final Report
Individual Project - Final ReportSteven Hooper
 
Evaluation of network intrusion detection using markov chain
Evaluation of network intrusion detection using markov chainEvaluation of network intrusion detection using markov chain
Evaluation of network intrusion detection using markov chain
IJCI JOURNAL
 
An efficient distributed trust model for wireless sensor networks
An efficient distributed trust model for wireless sensor networksAn efficient distributed trust model for wireless sensor networks
An efficient distributed trust model for wireless sensor networks
Pvrtechnologies Nellore
 
INFRINGEMENT PRECLUSION SYSTEM VIA SADEC: STEALTHY ATTACK DETECTION AND COUNT...
INFRINGEMENT PRECLUSION SYSTEM VIA SADEC: STEALTHY ATTACK DETECTION AND COUNT...INFRINGEMENT PRECLUSION SYSTEM VIA SADEC: STEALTHY ATTACK DETECTION AND COUNT...
INFRINGEMENT PRECLUSION SYSTEM VIA SADEC: STEALTHY ATTACK DETECTION AND COUNT...
ijp2p
 

What's hot (16)

Cis 333 Education Organization / snaptutorial.com
Cis 333   Education Organization / snaptutorial.comCis 333   Education Organization / snaptutorial.com
Cis 333 Education Organization / snaptutorial.com
 
Comprehensive Study of Counter-acting Security Threats in Mobile Ad Hoc Networks
Comprehensive Study of Counter-acting Security Threats in Mobile Ad Hoc NetworksComprehensive Study of Counter-acting Security Threats in Mobile Ad Hoc Networks
Comprehensive Study of Counter-acting Security Threats in Mobile Ad Hoc Networks
 
International Journal of Wireless & Mobile Networks (IJWMN)
International Journal of Wireless & Mobile Networks (IJWMN) International Journal of Wireless & Mobile Networks (IJWMN)
International Journal of Wireless & Mobile Networks (IJWMN)
 
A Reliable Peer-to-Peer Platform for Adding New Node Using Trust Based Model
A Reliable Peer-to-Peer Platform for Adding New Node Using Trust Based Model    A Reliable Peer-to-Peer Platform for Adding New Node Using Trust Based Model
A Reliable Peer-to-Peer Platform for Adding New Node Using Trust Based Model
 
An Efficient Secured And Inspection of Malicious Node Using Double Encryption...
An Efficient Secured And Inspection of Malicious Node Using Double Encryption...An Efficient Secured And Inspection of Malicious Node Using Double Encryption...
An Efficient Secured And Inspection of Malicious Node Using Double Encryption...
 
Whitepaper- User Behavior-Based Anomaly Detection for Cyber Network Security
Whitepaper- User Behavior-Based Anomaly Detection for Cyber Network SecurityWhitepaper- User Behavior-Based Anomaly Detection for Cyber Network Security
Whitepaper- User Behavior-Based Anomaly Detection for Cyber Network Security
 
Classification of Malware Attacks Using Machine Learning In Decision Tree
Classification of Malware Attacks Using Machine Learning In Decision TreeClassification of Malware Attacks Using Machine Learning In Decision Tree
Classification of Malware Attacks Using Machine Learning In Decision Tree
 
Building AI with Security and Privacy in mind
Building AI with Security and Privacy in mindBuilding AI with Security and Privacy in mind
Building AI with Security and Privacy in mind
 
Securing Personal Information in Data Mining
Securing Personal Information in Data MiningSecuring Personal Information in Data Mining
Securing Personal Information in Data Mining
 
Survey of Security Threats and Protection Techniques in Mobile Ad Hoc Networks
Survey of Security Threats and Protection Techniques in Mobile Ad Hoc NetworksSurvey of Security Threats and Protection Techniques in Mobile Ad Hoc Networks
Survey of Security Threats and Protection Techniques in Mobile Ad Hoc Networks
 
Do s and d dos attacks at osi layers
Do s and d dos attacks at osi layersDo s and d dos attacks at osi layers
Do s and d dos attacks at osi layers
 
An efficient distributed trust model for wireless sensor networks
An efficient distributed trust model for wireless sensor networksAn efficient distributed trust model for wireless sensor networks
An efficient distributed trust model for wireless sensor networks
 
Individual Project - Final Report
Individual Project - Final ReportIndividual Project - Final Report
Individual Project - Final Report
 
Evaluation of network intrusion detection using markov chain
Evaluation of network intrusion detection using markov chainEvaluation of network intrusion detection using markov chain
Evaluation of network intrusion detection using markov chain
 
An efficient distributed trust model for wireless sensor networks
An efficient distributed trust model for wireless sensor networksAn efficient distributed trust model for wireless sensor networks
An efficient distributed trust model for wireless sensor networks
 
INFRINGEMENT PRECLUSION SYSTEM VIA SADEC: STEALTHY ATTACK DETECTION AND COUNT...
INFRINGEMENT PRECLUSION SYSTEM VIA SADEC: STEALTHY ATTACK DETECTION AND COUNT...INFRINGEMENT PRECLUSION SYSTEM VIA SADEC: STEALTHY ATTACK DETECTION AND COUNT...
INFRINGEMENT PRECLUSION SYSTEM VIA SADEC: STEALTHY ATTACK DETECTION AND COUNT...
 

Similar to Towards Statistical Queries over Distributed Private User Data

Collusion tolerable privacy-preserving sum
Collusion tolerable privacy-preserving sumCollusion tolerable privacy-preserving sum
Collusion tolerable privacy-preserving sum
nexgentech15
 
COLLUSION-TOLERABLE PRIVACY-PRESERVING SUM AND PRODUCT CALCULATION WITHOUT SE...
COLLUSION-TOLERABLE PRIVACY-PRESERVING SUM AND PRODUCT CALCULATION WITHOUT SE...COLLUSION-TOLERABLE PRIVACY-PRESERVING SUM AND PRODUCT CALCULATION WITHOUT SE...
COLLUSION-TOLERABLE PRIVACY-PRESERVING SUM AND PRODUCT CALCULATION WITHOUT SE...
Nexgen Technology
 
COLLUSION-TOLERABLE PRIVACY-PRESERVING SUM AND PRODUCT CALCULATION WITHOUT SE...
COLLUSION-TOLERABLE PRIVACY-PRESERVING SUM AND PRODUCT CALCULATION WITHOUT SE...COLLUSION-TOLERABLE PRIVACY-PRESERVING SUM AND PRODUCT CALCULATION WITHOUT SE...
COLLUSION-TOLERABLE PRIVACY-PRESERVING SUM AND PRODUCT CALCULATION WITHOUT SE...
nexgentechnology
 
Collusion tolerable privacy-preserving sum
Collusion tolerable privacy-preserving sumCollusion tolerable privacy-preserving sum
Collusion tolerable privacy-preserving sumNexgen Technology
 
Privacy-preserving Information Sharing: Tools and Applications
Privacy-preserving Information Sharing: Tools and ApplicationsPrivacy-preserving Information Sharing: Tools and Applications
Privacy-preserving Information Sharing: Tools and Applications
Emiliano De Cristofaro
 
IEEE 2014 JAVA DATA MINING PROJECTS M privacy for collaborative data publishing
IEEE 2014 JAVA DATA MINING PROJECTS M privacy for collaborative data publishingIEEE 2014 JAVA DATA MINING PROJECTS M privacy for collaborative data publishing
IEEE 2014 JAVA DATA MINING PROJECTS M privacy for collaborative data publishing
IEEEFINALYEARSTUDENTPROJECTS
 
2014 IEEE JAVA DATA MINING PROJECT M privacy for collaborative data publishing
2014 IEEE JAVA DATA MINING PROJECT M privacy for collaborative data publishing2014 IEEE JAVA DATA MINING PROJECT M privacy for collaborative data publishing
2014 IEEE JAVA DATA MINING PROJECT M privacy for collaborative data publishing
IEEEMEMTECHSTUDENTSPROJECTS
 
Privacy preserving computing and secure multi party computation
Privacy preserving computing and secure multi party computationPrivacy preserving computing and secure multi party computation
Privacy preserving computing and secure multi party computation
Ulf Mattsson
 
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...
Krishnaram Kenthapadi
 
Building AI with Security Privacy in Mind
Building AI with Security Privacy in MindBuilding AI with Security Privacy in Mind
Building AI with Security Privacy in Mind
geetachauhan
 
Technologies in Support of Big Data Ethics
Technologies in Support of Big Data EthicsTechnologies in Support of Big Data Ethics
Technologies in Support of Big Data Ethics
Mark Underwood
 
In Processes We Trust: Privacy and Trust in Business Processes
In Processes We Trust: Privacy and Trust in Business ProcessesIn Processes We Trust: Privacy and Trust in Business Processes
In Processes We Trust: Privacy and Trust in Business Processes
Marlon Dumas
 
IRJET- Portable Biometric E-Voting System
IRJET- Portable Biometric E-Voting SystemIRJET- Portable Biometric E-Voting System
IRJET- Portable Biometric E-Voting System
IRJET Journal
 
PUBLIC INTEGRIYT AUDITING FOR SHARED DYNAMIC DATA STORAGE UNDER ONTIME GENERA...
PUBLIC INTEGRIYT AUDITING FOR SHARED DYNAMIC DATA STORAGE UNDER ONTIME GENERA...PUBLIC INTEGRIYT AUDITING FOR SHARED DYNAMIC DATA STORAGE UNDER ONTIME GENERA...
PUBLIC INTEGRIYT AUDITING FOR SHARED DYNAMIC DATA STORAGE UNDER ONTIME GENERA...
paperpublications3
 
Privacy preserving detection of sensitive data exposure
Privacy preserving detection of sensitive data exposurePrivacy preserving detection of sensitive data exposure
Privacy preserving detection of sensitive data exposure
Pvrtechnologies Nellore
 
Differential privacy and ml
Differential privacy and mlDifferential privacy and ml
Differential privacy and ml
Samuel Witherspoon
 
Anonymity based privacy-preserving data
Anonymity based privacy-preserving dataAnonymity based privacy-preserving data
Anonymity based privacy-preserving data
Kamal Spring
 
Privacy preserving detection of sensitive data exposure
Privacy preserving detection of sensitive data exposurePrivacy preserving detection of sensitive data exposure
Privacy preserving detection of sensitive data exposure
redpel dot com
 
Privacy Engineering: Enabling Mobility of Mental Health Services with Data Pr...
Privacy Engineering: Enabling Mobility of Mental Health Services with Data Pr...Privacy Engineering: Enabling Mobility of Mental Health Services with Data Pr...
Privacy Engineering: Enabling Mobility of Mental Health Services with Data Pr...
CREST
 
JAVA 2013 IEEE NETWORKSECURITY PROJECT Utility privacy tradeoff in databases ...
JAVA 2013 IEEE NETWORKSECURITY PROJECT Utility privacy tradeoff in databases ...JAVA 2013 IEEE NETWORKSECURITY PROJECT Utility privacy tradeoff in databases ...
JAVA 2013 IEEE NETWORKSECURITY PROJECT Utility privacy tradeoff in databases ...
IEEEGLOBALSOFTTECHNOLOGIES
 

Similar to Towards Statistical Queries over Distributed Private User Data (20)

Collusion tolerable privacy-preserving sum
Collusion tolerable privacy-preserving sumCollusion tolerable privacy-preserving sum
Collusion tolerable privacy-preserving sum
 
COLLUSION-TOLERABLE PRIVACY-PRESERVING SUM AND PRODUCT CALCULATION WITHOUT SE...
COLLUSION-TOLERABLE PRIVACY-PRESERVING SUM AND PRODUCT CALCULATION WITHOUT SE...COLLUSION-TOLERABLE PRIVACY-PRESERVING SUM AND PRODUCT CALCULATION WITHOUT SE...
COLLUSION-TOLERABLE PRIVACY-PRESERVING SUM AND PRODUCT CALCULATION WITHOUT SE...
 
COLLUSION-TOLERABLE PRIVACY-PRESERVING SUM AND PRODUCT CALCULATION WITHOUT SE...
COLLUSION-TOLERABLE PRIVACY-PRESERVING SUM AND PRODUCT CALCULATION WITHOUT SE...COLLUSION-TOLERABLE PRIVACY-PRESERVING SUM AND PRODUCT CALCULATION WITHOUT SE...
COLLUSION-TOLERABLE PRIVACY-PRESERVING SUM AND PRODUCT CALCULATION WITHOUT SE...
 
Collusion tolerable privacy-preserving sum
Collusion tolerable privacy-preserving sumCollusion tolerable privacy-preserving sum
Collusion tolerable privacy-preserving sum
 
Privacy-preserving Information Sharing: Tools and Applications
Privacy-preserving Information Sharing: Tools and ApplicationsPrivacy-preserving Information Sharing: Tools and Applications
Privacy-preserving Information Sharing: Tools and Applications
 
IEEE 2014 JAVA DATA MINING PROJECTS M privacy for collaborative data publishing
IEEE 2014 JAVA DATA MINING PROJECTS M privacy for collaborative data publishingIEEE 2014 JAVA DATA MINING PROJECTS M privacy for collaborative data publishing
IEEE 2014 JAVA DATA MINING PROJECTS M privacy for collaborative data publishing
 
2014 IEEE JAVA DATA MINING PROJECT M privacy for collaborative data publishing
2014 IEEE JAVA DATA MINING PROJECT M privacy for collaborative data publishing2014 IEEE JAVA DATA MINING PROJECT M privacy for collaborative data publishing
2014 IEEE JAVA DATA MINING PROJECT M privacy for collaborative data publishing
 
Privacy preserving computing and secure multi party computation
Privacy preserving computing and secure multi party computationPrivacy preserving computing and secure multi party computation
Privacy preserving computing and secure multi party computation
 
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...
 
Building AI with Security Privacy in Mind
Building AI with Security Privacy in MindBuilding AI with Security Privacy in Mind
Building AI with Security Privacy in Mind
 
Technologies in Support of Big Data Ethics
Technologies in Support of Big Data EthicsTechnologies in Support of Big Data Ethics
Technologies in Support of Big Data Ethics
 
In Processes We Trust: Privacy and Trust in Business Processes
In Processes We Trust: Privacy and Trust in Business ProcessesIn Processes We Trust: Privacy and Trust in Business Processes
In Processes We Trust: Privacy and Trust in Business Processes
 
IRJET- Portable Biometric E-Voting System
IRJET- Portable Biometric E-Voting SystemIRJET- Portable Biometric E-Voting System
IRJET- Portable Biometric E-Voting System
 
PUBLIC INTEGRIYT AUDITING FOR SHARED DYNAMIC DATA STORAGE UNDER ONTIME GENERA...
PUBLIC INTEGRIYT AUDITING FOR SHARED DYNAMIC DATA STORAGE UNDER ONTIME GENERA...PUBLIC INTEGRIYT AUDITING FOR SHARED DYNAMIC DATA STORAGE UNDER ONTIME GENERA...
PUBLIC INTEGRIYT AUDITING FOR SHARED DYNAMIC DATA STORAGE UNDER ONTIME GENERA...
 
Privacy preserving detection of sensitive data exposure
Privacy preserving detection of sensitive data exposurePrivacy preserving detection of sensitive data exposure
Privacy preserving detection of sensitive data exposure
 
Differential privacy and ml
Differential privacy and mlDifferential privacy and ml
Differential privacy and ml
 
Anonymity based privacy-preserving data
Anonymity based privacy-preserving dataAnonymity based privacy-preserving data
Anonymity based privacy-preserving data
 
Privacy preserving detection of sensitive data exposure
Privacy preserving detection of sensitive data exposurePrivacy preserving detection of sensitive data exposure
Privacy preserving detection of sensitive data exposure
 
Privacy Engineering: Enabling Mobility of Mental Health Services with Data Pr...
Privacy Engineering: Enabling Mobility of Mental Health Services with Data Pr...Privacy Engineering: Enabling Mobility of Mental Health Services with Data Pr...
Privacy Engineering: Enabling Mobility of Mental Health Services with Data Pr...
 
JAVA 2013 IEEE NETWORKSECURITY PROJECT Utility privacy tradeoff in databases ...
JAVA 2013 IEEE NETWORKSECURITY PROJECT Utility privacy tradeoff in databases ...JAVA 2013 IEEE NETWORKSECURITY PROJECT Utility privacy tradeoff in databases ...
JAVA 2013 IEEE NETWORKSECURITY PROJECT Utility privacy tradeoff in databases ...
 

Recently uploaded

Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 

Recently uploaded (20)

Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 

Towards Statistical Queries over Distributed Private User Data

  • 1. Towards Statistical Queries over Distributed Private User Data R.Chen, A.Reznichenko, P.Francis – MPI-SWS, Germany J.Gehrke – Cornell University, USA Serafeim Chatzopoulos M1258 schatz@di.uoa.gr MDE519 – Distributed Systems Instructor: Mema Roussopoulou May 31, 2013
  • 2. User Privacy Towards Statistical Queries over Distributed Private User Data 2  User Data is exposed to organizations in many ways.  Users are aware of their data being exposed.  Make a purchase in an online store.  Update a profile on a social network.  Users are unaware of their data exposure.  Third party trackers.  Smart phone Apps.
  • 3. The “user-owned and operated” principle Towards Statistical Queries over Distributed Private User Data 3  Personal data should be stored in a local host or a cloud device under the user‟s control and is released in a controlled, limited or noisy fashion. Users must have the exclusive control of their own data and must be able to share data selectively or voluntarily.
  • 4. Motivation and Problem Towards Statistical Queries over Distributed Private User Data 4  Distributed private user data is important.  Analyst could use such data to  understand users‟ behaviors  discover their statistic patterns  evaluate proposed enhancements.  How to make statistical queries over such distributed private user data while still preserving privacy?
  • 5. Related Work Towards Statistical Queries over Distributed Private User Data 5  Anonymization Removes well-known personally identifiable information(PPI).  Randomization Adds random distortion values to user data.  k-anonymity, l-diversity, t-closeness  Differential Privacy
  • 6. Differential Privacy Towards Statistical Queries over Distributed Private User Data 6  Differential privacy adds noise to the output of a computation (i.e., answer of query).  Hides the presence or absence of a record in the dataset.  Makes no assumption about the adversary. Some form of distributed differential privacy is required…
  • 7. Prior Distributed Differential Privacy Designs Towards Statistical Queries over Distributed Private User Data 7  First design has a per-user computational load of O(U). Dwork et al. EUROCRYPT ‟06  Poor scalability  Following designs reduce per-user computational load to O(1) by using expensive secret sharing protocols. Rastogi and Nath, SIGMOD ‟10 – Shi et al. NDSS ‟11  Not tolerate churn  Recent designs introduce two honest-but-curious servers to collaboratively compute the query result. Gotz and Nath, MSR-TR ‟11  Even a single malicious user can substantially distort the query result.
  • 8. Practical Distributed Differential Privacy System (PDDP) Towards Statistical Queries over Distributed Private User Data 8  Goals:  The differential private guarantee is always maintained for every honest client.  Puts tight bound to the extent to which a malicious user can distort query results.  The maximum absolute distortion in the final result is bounded by the number of malicious users.  Operates at a large scale.  Millions of users.  Tolerates churn.  Not prevent results from being produced.
  • 9. PDDP Components Towards Statistical Queries over Distributed Private User Data 9  Analyst  Makes queries to the system and collects answers.  Proxy  Adds differential private noise to client‟s answers to preserve privacy  Clients  Locally maintain their own data and answer queries.
  • 10. Security Assumptions (1/2) Towards Statistical Queries over Distributed Private User Data 10  General Assumptions  Clients have the correct public keys for analyst and the proxy.  Analyst and the proxy have the correct public keys for each other.  Corresponding private keys are kept secure.  Analyst is potentially malicious (violating users‟ privacy)  Collude with other analysts.  Pretend to be multiple distinct analysts.  Take control of clients and use PDDP protocol to reveal info.  Publish its collected answers.  Intercept and modify all messages.
  • 11. Security Assumptions (2/2) Towards Statistical Queries over Distributed Private User Data 11  Proxy is honest but curious (HbC)  Follows the specified protocol.  Tries to exploit additional info that can be learned in so doing.  Does not collude with other components.  Clients are potentially malicious (distorting the statistical results learned by analysts)  Have churn characteristics.  Limited resources for computation and data transmission.  Generate false or illegitimate answers.  Act as Sybils.
  • 12. PDDP Key insights – Binary answer Towards Statistical Queries over Distributed Private User Data 12  How to limit query result distortion?  Split answer‟s value into buckets.  Enforce a binary answer in each bucket.  Goldwasser-Micali (GM) bit-cryptosystem. Example: Query: “SELECT age FROM info WHERE gender=„m‟”  4 buckets: 0~12, 13~20, 21~59, and ≥60.  Answers: „1‟ or „0‟ per bucket  Malicious clients cannot substantially distort the query result.
  • 13. PDDP Key insights – Blind noise Towards Statistical Queries over Distributed Private User Data 13  How to achieve differential privacy ?  Honest-but-curious proxy Generates additional binary answers in each bucket as differentially private noise.  If analyst publishes the final noisy result  proxy knows the noise added  can subtract noise from the publish result to get a noisy-free result.  Solution: Proxy can only blindly add noise!  Proxy knows that the added noise is enough to achieve differential privacy  Proxy does not know the exact noise added.
  • 14. PDDP Workflow – Step 1 Towards Statistical Queries over Distributed Private User Data 14  Query Initialization Analyst first issues a query to the Proxy.  Message consists of 4 items:  Query: SELECT age FROM info WHERE gender=„m‟  Buckets: 0∼12, 13∼20, 21∼59 and ≥60.  # clients queried (c): 1000  DP parameter (ε): 1.0  Controls tradeoff between accuracy of computation and strength of its privacy guarantee.
  • 15. PDDP Workflow – Step 2 Towards Statistical Queries over Distributed Private User Data 15  Query Forwarding Select clients and send them the query.  Proxy:  rejects the query if c is too low or too high.  rejects the query if ε exceeds the max privacy level allowed.  selects c unique clients and send them the query, under the one of the following policies:  Select c clients randomly and wait for them to connect.  Select the first c clients that connect.
  • 16. PDDP Workflow – Step 3 (1/2) Towards Statistical Queries over Distributed Private User Data 16  Client Response Clients execute the query and send answers.  Client executes query over its local data and produces answer:  „1‟ or „0‟ per bucket.  More than one bucket may contain a „1‟.  Per-bucket answer value is individually encrypted with the analyst‟s public key. (GM cryptosystem)
  • 17. PDDP Workflow – Step 3 (2/2) Towards Statistical Queries over Distributed Private User Data 17  Goldwasser-Micali (GM) cryptosystem  Single-bit cryptosystem  Enforces binary answer in each bucket.  Very Efficient  XOR – homomorphic  E(a) * E(b) = E(a XOR b)
  • 18. PDDP Workflow – Step 4 Towards Statistical Queries over Distributed Private User Data 18  Blind noise addition  The proxy maintains a pool of additional binary answers called coins and adds them as noise to each bucket.  Coins must be unbiased.  Coins are encrypted with the analyst‟s public key.  In each bucket must be added n coins: How to generate coins blindly?
  • 19. Coin pool generation Towards Statistical Queries over Distributed Private User Data 19  Straightforward approaches  Proxy generates coins  Curious proxy could know noise-free result  Clients generate coins  Malicious clients could generate biased coins
  • 20. Collaborative coin generation Towards Statistical Queries over Distributed Private User Data 20  Paper‟s approach  Each online client periodically generates an encrypted unbiased coin E(oc) and sends it to the proxy  The proxy receives the coin and verifies the legitimacy of the coin.  The proxy blindly re-flips the coin E(oc) by multiplying it with a proxy‟s locally generated unbiased coin E(op) plus a modulo operation. E(oc) * E(op) mod m = E(oc XOR op), where m is part of the analyst’s public key  The proxy stores the unbiased coin in the locally maintained pool.  Proxy doesn‟t know the actual value of the generated unbiased coin.
  • 21. PDDP Workflow – Step 5 Towards Statistical Queries over Distributed Private User Data 21  Noisy answers to analyst  Each bucket has clients answers + coins (noise)  After random delay the proxy shuffles the c + n values.  Prevents identification of a client based on the vector of „1‟ and „0‟ in its answer.  Finally, analyst  decrypts with its private key all encrypted binary values.  sums the plaintext values obtained.  obtains the noisy answer for the clients that fall within each bucket.
  • 22. Practical Considerations (1/2) Towards Statistical Queries over Distributed Private User Data 22  Utility of aggregate result  Depends on the amount of added noise.  The n coins added by the proxy and the analyst‟s adjustment on the means of n/2 form a binomial distribution (approximation of the normal distribution N(0, n/4) ).  Example : c =106 , ε = 1.0 Given normal distribution in each bucket  68% probability that the noisy answer is 15.24 away from the true answer  95% probability that the noisy answer is 30.48 away from the true answer  99.7% probability that the noisy answer is 45.72 away from the true answer
  • 23. Practical Considerations (2/2) Towards Statistical Queries over Distributed Private User Data 23  Non-numeric Queries  Map query into a numeric query. Example: “Which website do you visit most often?” Map each website the analyst wishes to learn into a numeric value. Large number of buckets – limit the answer to 5000 buckets.  Sybils  Design susceptible to Sybil attacks (single client can masquerades multiple clients).  Proxy can limit the number of clients selected at a single IP address for a given query.
  • 24. Implementation and Deployment (1/2) Towards Statistical Queries over Distributed Private User Data 24  Client  Firefox add-on  9600 lines of Java code  Information is stored in local SQLite storage  Web browsing activities  Certain online shopping activities  Certain ad interactions  Can be extended to capture any online activity  Every 5 min connects to the proxy to retrieve pending queries, return answers and periodically generated coins.
  • 25. Implementation and Deployment (2/2) Towards Statistical Queries over Distributed Private User Data 25  Proxy  Web service on Tomcat 6.0.33  3600 lines of code  Proxy state in MySQL database.  Analyst  800 lines of code Deployment  Correctness verified on a set of local machines.  600+ real clients
  • 26. Comparison: “Paillier-based” design Towards Statistical Queries over Distributed Private User Data 26  Honest-but-Curious Proxy  Paillier Cryptosystem  Additive homomorphism  Proxy can directly sum up all clients‟ encrypted binary answers to get the encrypted sum of each bucket.  A single malicious client can distort substantially the result  Use of zero-knowledge-proofs (ZKP) to ensure that encrypted answers are „1‟ or „0‟.  Proxy knows exactly how much noise has been added.
  • 27. Evaluation (1/5) Towards Statistical Queries over Distributed Private User Data 27  Client Performance  Clients encrypt a binary value for each bucket.  GM cryptosystem  Paillier cryptosystem
  • 28. Evaluation (2/5) Towards Statistical Queries over Distributed Private User Data 28  Proxy - Analyst Performance  Proxy PDDP  One encryption and one homomorphic XOR for one unbiased coin.  Jacobi symbol checking on received coins and answer values (faster than a decryption). Paillier-based  One ZKP for each client answer in each bucket.  Homomorphically sum up all clients answers per bucket.  Add noise to each per-bucket total sum.
  • 29. Evaluation (3/5) Towards Statistical Queries over Distributed Private User Data 29  Proxy - Analyst Performance  Analyst PDDP  Decrypt all encrypted values in each bucket. Paillier-based  Decrypt one encrypted value in each bucket
  • 30. Evaluation (4/5) Towards Statistical Queries over Distributed Private User Data 30  Bandwidth overhead  In both systems, a client transmits an encrypted answer to each bucket.  In PDDP, a client transmits periodically generated coin to the proxy.  In Paillier-based, a client transmits a ZKP for each bucket.  Storage overhead  In PDDP, the proxy stores all clients‟ answer values for each bucket plus the required number of coins.  In Paillier-based, proxy stores only one answer value per bucket.
  • 31. Evaluation (5/5) Towards Statistical Queries over Distributed Private User Data 31  Querying the client deployment  Parameters c = 250 (out of 600 clients) ε = 5.0 clients are selected as they connect until 250 unique clients are queried or 24-hours expire. These parameters result in 16 coins per bucket.  Ensure that a per bucket aggregate answer is within plus or minus 2, 4, 6 of the noisy-free answer with a probability of 68%, 95% and 99,7%
  • 32. Future Work Towards Statistical Queries over Distributed Private User Data 32  Support of statistical learning algorithms  Scalability of non-numeric queries  Bloom filters – map a large number of possible answers in a small number of buckets.  Gather statistical data for a large-scale experiment.  Weaken proxy trust requirements.  Use of trusted hardware (TPM)  General: measure the actual privacy loss for differential privacy.
  • 33. Conclusion Towards Statistical Queries over Distributed Private User Data 33  PDDP: Practical Distributed Differential Private System  Scales well  Tolerates churn  Places tight bound on malicious user‟s capability.  Key insights  Binary answer in each bucket  Blind noise addition
  • 34. Towards Statistical Queries over Distributed Private User Data 34 Questions?