Your SlideShare is downloading.
×

×

Saving this for later?
Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.

Text the download link to your phone

Standard text messaging rates apply

Like this presentation? Why not share!

- Frequency-based Constraint Relaxati... by Junpei Kawamoto 91 views
- 初期レビューを⽤用いた⻑⾧長期間評価推定 by Junpei Kawamoto 182 views
- マルコフモデルを仮定した位置情報開示のためのアドバーザリアルプライバシ by Junpei Kawamoto 431 views
- Securing Social Information from Qu... by Junpei Kawamoto 149 views
- 老成腦袋中的Sass&Compass by 智遠 成 231 views
- HTML5, きちんと。 by Masataka Yakura 836453 views

258

Published on

WAIS 2014.

WAIS 2014.

No Downloads

Total Views

258

On Slideshare

0

From Embeds

0

Number of Embeds

0

Shares

0

Downloads

4

Comments

0

Likes

1

No embeds

No notes for slide

- 1. PRIVACY FOR CONTINUAL DATA PUBLISHING Junpei Kawamoto, Kouichi Sakurai (Kyushu University, Japan) This work is partly supported by Grants-in-Aid for Scientific Research (B)(23300027), Japan Society for the Promotion of Science (JSPS)
- 2. Jan. 10 2 Privacy for Continual Data Publishing Analysis of Location data (Big data) • We can easily gather location data from GPS, etc. Which cross roads are danger? Find car accidents quickly Find available roads Count Frequent Patterns Change Point Detection Etc.
- 3. Jan. 10 3 Privacy for Continual Data Publishing Privacy for Publishing Location Data • Publishing location data of people. Publish Collector Collector Analyst • Location data should be kept secret sometimes. • Someone wants to keep where he was secret. • Privacy preserving data publishing is necessary.
- 4. Jan. 10 4 Privacy for Continual Data Publishing Assumption of collector • Collecting people’s location and publishing histograms. Publish collector π π π Analyst t = 3 t = 2 t = 1 POI Count POI Count POI Count A 15000 A 15200 A 15300 B 30300 B 30100 B 30000 • Every time span, the collector publishes a histogram. • We argue what kind of privacy the collector should guarantee.
- 5. Jan. 10 Privacy for Continual Data Publishing 5 Related Work: Differential Privacy1 • Privacy definition of de facto standard. • Keeps any person’s locations are in histograms secret, • Adds Laplace-noises to histograms, ⎛ | x − µ | ⎞ 1 ⎟ exp⎜ − ⎜ 2φ φ ⎟ ⎝ ⎠ • Guarantees privacy for attacks using any kind of knowledge. • Added noises are too big in less-populated areas. The number of people in a less-populated area [1] C.Dwork, F.McSherry, K.Nissim, A.Smith, “Calibrating noise to sensitivity in private data analysis”, Proc. of the Third Conference on Theory of Cryptography, pp. 265-284, 2006.
- 6. Jan. 10 6 Privacy for Continual Data Publishing Related Work: Differential Privacy1 • Privacy definition of de facto standard • Keeps any person’s locations are in histograms secret • Adds Laplace-noises to histograms ⎛ | x − µ | ⎞ 1 ⎟ exp⎜ − ⎜ 2φ φ ⎟ ⎝ Our objective: ⎠ to construct privacy definition for private histograms with preserving utilities of outputs as kind of knowledge • Guarantees privacy for attacks using any much as possible • Added noises are too big in less-populated areas The number of people in a less-populated area vs. [1] C.Dwork, F.McSherry, K.Nissim, A.Smith, “Calibrating noise to sensitivity in private data analysis”, Proc. of the Third Conference on Theory of Cryptography, pp. 265-284, 2006.
- 7. Jan. 10 7 Privacy for Continual Data Publishing Main idea of our privacy definition • Differential privacy hides any moves • We assume it isn’t necessary to hide explicit moves Under construction D Under construction A C Turns left to B Most of people entering from A B Public knowledge If an adversary knows a victim was in A at time t and the victim moves B at time t+1, we don’t care the privacy.
- 8. Jan. 10 8 Privacy for Continual Data Publishing Main idea of our privacy definition • Employing Markov process to argue explicit/implicit moves • We assume if outputs don’t give more information than the Markov 0.1 process to adversaries, the outputs are private A -> A: explicit A -> B: implicit Focus privacy of this move 0.9 A 0.5 B 0.5 Markov process Public • We employ “Adversarial Privacy”2 • A privacy definition bounds information outputs give adversaries. [2] V.Rastogi, M.Hay, G.Miklau, D.Suciu, “Relationship Privacy: Output Perturbation for Queries with Joins”, Proc. of the ACM Symposium on Principles of Database Systems, pp.107-116, 2009.
- 9. Jan. 10 Privacy for Continual Data Publishing 9 Adversarial Privacy • The definition • p(X): adversaries’ prior belief of an event X • p(X | O): adversaries’ posterior belief of X after observing an output O • The output O is ε-adversarial private iff for any X, p(X | O) ≦ eε p(X) • We need to design X and O for the problem applied adversarial privacy • X: a person is in POI lj at time t i.e. Xt = lj • O: published histogram at time t i.e. π(t) • p: an algorithm computing adversaries’ belief • We design p for some adversary classes depended on use cases One of the our contributions
- 10. Jan. 10 Privacy for Continual Data Publishing Adversary Classes • Markov-Knowledge Adversary (MK) • Guessing which POI a victim is in at time t • Utilizing the Markov process and output histograms before time t • Any-Person-Knowledge Adversary (APK) • Guessing which POI a victim is in at time t • Utilizing the Markov process and output histograms before time t and which POI the victim was in at time t – 1 10
- 11. Jan. 10 Privacy for Continual Data Publishing Adversary Classes • Markov-Knowledge Adversary (MK) • Guessing which POI a victim is in at time t • Utilizing the Markov process and output histograms before time t • Any-Person-Knowledge Adversary (APK) • Guessing which POI a victim is in at time t • Utilizing the Markov process and output histograms before time t and which POI the victim was in at time t – 1 APK class is stronger than ML class. Today, we focus on APK classes. 11
- 12. Jan. 10 Privacy for Continual Data Publishing Beliefs of APK-class adversaries • Prior belief before observing output π(t) p(Xt = l j | Xt−1 = li , (π(t −1)t P)t , π(t −1);P) • Posterior belief after observing output π(t) • l j | X t−1 = li , π(t), π(t −1);P) p(Xt = • Thus, output π(t) is ε-adversarial private for APK class iff • ∀li, lj, p(Xt = l j | Xt−1 = li , π(t), π(t −1);P) ≤ eε p(Xt = l j | Xt−1 = li , (π(t −1)t P)t , π(t −1);P) 12
- 13. Jan. 10 13 Privacy for Continual Data Publishing Computing private histograms • Loss of modified histogram • π0(t): original histogram at time t π(t): adversarial private histogram at time t loss(π(t), π 0 (t))= π(t) − π 0 (t) 2 • Problem of computing adversarial private histograms • a optimization problem • minimize loss(π(t), π0(t)) • s.t. ∀li, lj, p(Xt = l j | Xt−1 = li , π(t), π(t −1);P) ≤ eε p(Xt = l j | Xt−1 = li , (π(t −1)t P)t , π(t −1);P) • We employ a heuristic algorithm to solve this.
- 14. Jan. 10 14 Privacy for Continual Data Publishing Extension for High-order Markov Process • We assumed 1st-order Markov Process 0.9 • Elements of published histograms means a POI 0.1 A 0.5 B 0.5 • High-order Markov Process let us publish counts of paths • We can convert high-order Markov process to 1st-order Markov process B→C A→B B→D A→D Example of 2-order Markov process • We can publish counts of 2-length paths
- 15. Jan. 10 15 Privacy for Continual Data Publishing Extension for High-order Markov Process • We assumed 1st-order Markov Process 0.9 • Elements of published histograms means a POI 0.1 A 0.5 B 0.5 • High-order Markov Process let us publish counts of paths • We can convert high-order Markov process to 1st-order Markov process B→C A→B Our proposal guarantee privacy B→D for publishing n-gram paths’ counts A→D Example of 2-order Markov process • We can publish counts of 2-length paths
- 16. Jan. 10 Privacy for Continual Data Publishing Evaluation • Set two mining tasks • Change point detection • Frequent paths extraction • Datasets • Moving people in Tokyo, 1998 provided by People Flow Project3 • Construct two small datasets: Shibuya and Machida • Shibuya: lots of people moving, to evaluate in urban area • Machida: less people moving, to evaluate in sub-urban area [3] http://pflow.csis.u-tokyo.ac.jp/index-j.html 16
- 17. Jan. 10 Privacy for Continual Data Publishing Number of people (Shibuya) Plain: Original data AdvP: Proposal DP-1: DP (ε=1) DP-100: DP (ε=100) Errors in lesspopulated times DP: Differential privacy Almost same 17
- 18. Jan. 10 18 Privacy for Continual Data Publishing Change point detection (Shibuya) Change Point Scores • AdvP (proposal) has errors in rush hours • But, there are no false positive • DP-1, DP-100 have many errors • DP-100 is too weak setting but has errors Errors
- 19. Jan. 10 Privacy for Continual Data Publishing Number of people (Machida) Almost same Too many noises 19
- 20. Jan. 10 20 Privacy for Continual Data Publishing Change point detection (Machida) Change point scores • AdvP (proposal) has errors in rush hours • DP-1, DP-100 have errors in any time errors
- 21. Jan. 10 21 Privacy for Continual Data Publishing Frequent paths extraction • We employ NDCG6 to evaluate accuracies of outputs good Shibuya Machida bad [6] K.Järvelin, J.Kekäläinen, ”IR evaluation methods for retrieving highly relevant documents,” Proc. of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp.41-48, 2000.
- 22. Jan. 10 22 Privacy for Continual Data Publishing Frequent paths extraction • We employ NDCG6 to evaluate accuracies of outputs good Shibuya Machida bad • Outputs by our proposal archives better results than differential privacy in both Shibuya and Machida. • Our proposal is effective for publishing paths’ counts [6] K.Järvelin, J.Kekäläinen, ”IR evaluation methods for retrieving highly relevant documents,” Proc. of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp.41-48, 2000.
- 23. Jan. 10 Privacy for Continual Data Publishing Conclusion • Propose a new privacy definition • Preserving utilities of outputs as much as possible • Assuming Markov process on people’s moves • Employing adversarial privacy framework • Evaluations with two data mining tasks • Change point detection and frequent paths extraction • Our privacy archives better utility than differential privacy • Future work • Applying to other mining tasks • Comparing with other privacy definitions 23

Be the first to comment