Your SlideShare is downloading. ×
0
Privacy for Continual Data Publishing
Privacy for Continual Data Publishing
Privacy for Continual Data Publishing
Privacy for Continual Data Publishing
Privacy for Continual Data Publishing
Privacy for Continual Data Publishing
Privacy for Continual Data Publishing
Privacy for Continual Data Publishing
Privacy for Continual Data Publishing
Privacy for Continual Data Publishing
Privacy for Continual Data Publishing
Privacy for Continual Data Publishing
Privacy for Continual Data Publishing
Privacy for Continual Data Publishing
Privacy for Continual Data Publishing
Privacy for Continual Data Publishing
Privacy for Continual Data Publishing
Privacy for Continual Data Publishing
Privacy for Continual Data Publishing
Privacy for Continual Data Publishing
Privacy for Continual Data Publishing
Privacy for Continual Data Publishing
Privacy for Continual Data Publishing
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Privacy for Continual Data Publishing

258

Published on

WAIS 2014.

WAIS 2014.

Published in: Technology, News & Politics
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
258
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. PRIVACY FOR CONTINUAL DATA PUBLISHING Junpei Kawamoto, Kouichi Sakurai (Kyushu University, Japan) This work is partly supported by Grants-in-Aid for Scientific Research (B)(23300027), Japan Society for the Promotion of Science (JSPS)
  • 2. Jan. 10 2 Privacy for Continual Data Publishing Analysis of Location data (Big data) •  We can easily gather location data from GPS, etc. Which cross roads are danger? Find car accidents quickly Find available roads Count Frequent Patterns Change Point Detection Etc.
  • 3. Jan. 10 3 Privacy for Continual Data Publishing Privacy for Publishing Location Data •  Publishing location data of people. Publish Collector Collector Analyst •  Location data should be kept secret sometimes. •  Someone wants to keep where he was secret. •  Privacy preserving data publishing is necessary.
  • 4. Jan. 10 4 Privacy for Continual Data Publishing Assumption of collector •  Collecting people’s location and publishing histograms. Publish collector π π π Analyst t = 3 t = 2 t = 1 POI Count POI Count POI Count A 15000 A 15200 A 15300 B 30300 B 30100 B 30000 •  Every time span, the collector publishes a histogram. •  We argue what kind of privacy the collector should guarantee.
  • 5. Jan. 10 Privacy for Continual Data Publishing 5 Related Work: Differential Privacy1 •  Privacy definition of de facto standard. •  Keeps any person’s locations are in histograms secret, •  Adds Laplace-noises to histograms, ⎛ | x − µ | ⎞ 1 ⎟ exp⎜ − ⎜ 2φ φ ⎟ ⎝ ⎠ •  Guarantees privacy for attacks using any kind of knowledge. •  Added noises are too big in less-populated areas. The number of people in a less-populated area [1] C.Dwork, F.McSherry, K.Nissim, A.Smith, “Calibrating noise to sensitivity in private data analysis”, Proc. of the Third Conference on Theory of Cryptography, pp. 265-284, 2006.
  • 6. Jan. 10 6 Privacy for Continual Data Publishing Related Work: Differential Privacy1 •  Privacy definition of de facto standard •  Keeps any person’s locations are in histograms secret •  Adds Laplace-noises to histograms ⎛ | x − µ | ⎞ 1 ⎟ exp⎜ − ⎜ 2φ φ ⎟ ⎝ Our objective: ⎠ to construct privacy definition for private histograms with preserving utilities of outputs as kind of knowledge •  Guarantees privacy for attacks using any much as possible •  Added noises are too big in less-populated areas The number of people in a less-populated area vs. [1] C.Dwork, F.McSherry, K.Nissim, A.Smith, “Calibrating noise to sensitivity in private data analysis”, Proc. of the Third Conference on Theory of Cryptography, pp. 265-284, 2006.
  • 7. Jan. 10 7 Privacy for Continual Data Publishing Main idea of our privacy definition •  Differential privacy hides any moves •  We assume it isn’t necessary to hide explicit moves Under construction D Under construction A C Turns left to B Most of people entering from A B Public knowledge If an adversary knows a victim was in A at time t and the victim moves B at time t+1, we don’t care the privacy.
  • 8. Jan. 10 8 Privacy for Continual Data Publishing Main idea of our privacy definition •  Employing Markov process to argue explicit/implicit moves •  We assume if outputs don’t give more information than the Markov 0.1 process to adversaries, the outputs are private A -> A: explicit A -> B: implicit Focus privacy of this move 0.9 A 0.5 B 0.5 Markov process Public •  We employ “Adversarial Privacy”2 •  A privacy definition bounds information outputs give adversaries. [2] V.Rastogi, M.Hay, G.Miklau, D.Suciu, “Relationship Privacy: Output Perturbation for Queries with Joins”, Proc. of the ACM Symposium on Principles of Database Systems, pp.107-116, 2009.
  • 9. Jan. 10 Privacy for Continual Data Publishing 9 Adversarial Privacy •  The definition •  p(X): adversaries’ prior belief of an event X •  p(X | O): adversaries’ posterior belief of X after observing an output O •  The output O is ε-adversarial private iff for any X, p(X | O) ≦ eε p(X) •  We need to design X and O for the problem applied adversarial privacy •  X: a person is in POI lj at time t i.e. Xt = lj •  O: published histogram at time t i.e. π(t) •  p: an algorithm computing adversaries’ belief •  We design p for some adversary classes depended on use cases One of the our contributions
  • 10. Jan. 10 Privacy for Continual Data Publishing Adversary Classes •  Markov-Knowledge Adversary (MK) •  Guessing which POI a victim is in at time t •  Utilizing the Markov process and output histograms before time t •  Any-Person-Knowledge Adversary (APK) •  Guessing which POI a victim is in at time t •  Utilizing the Markov process and output histograms before time t and which POI the victim was in at time t – 1 10
  • 11. Jan. 10 Privacy for Continual Data Publishing Adversary Classes •  Markov-Knowledge Adversary (MK) •  Guessing which POI a victim is in at time t •  Utilizing the Markov process and output histograms before time t •  Any-Person-Knowledge Adversary (APK) •  Guessing which POI a victim is in at time t •  Utilizing the Markov process and output histograms before time t and which POI the victim was in at time t – 1 APK class is stronger than ML class. Today, we focus on APK classes. 11
  • 12. Jan. 10 Privacy for Continual Data Publishing Beliefs of APK-class adversaries •  Prior belief before observing output π(t) p(Xt = l j | Xt−1 = li , (π(t −1)t P)t , π(t −1);P) •  Posterior belief after observing output π(t) •        l j | X t−1 = li , π(t), π(t −1);P) p(Xt = •  Thus, output π(t) is ε-adversarial private for APK class iff •  ∀li, lj, p(Xt = l j | Xt−1 = li , π(t), π(t −1);P) ≤ eε p(Xt = l j | Xt−1 = li , (π(t −1)t P)t , π(t −1);P) 12
  • 13. Jan. 10 13 Privacy for Continual Data Publishing Computing private histograms •  Loss of modified histogram •  π0(t): original histogram at time t π(t): adversarial private histogram at time t loss(π(t), π 0 (t))= π(t) − π 0 (t) 2 •  Problem of computing adversarial private histograms •  a optimization problem •  minimize loss(π(t), π0(t)) •  s.t. ∀li, lj, p(Xt = l j | Xt−1 = li , π(t), π(t −1);P) ≤ eε p(Xt = l j | Xt−1 = li , (π(t −1)t P)t , π(t −1);P) •  We employ a heuristic algorithm to solve this.
  • 14. Jan. 10 14 Privacy for Continual Data Publishing Extension for High-order Markov Process •  We assumed 1st-order Markov Process 0.9 •  Elements of published histograms means a POI 0.1 A 0.5 B 0.5 •  High-order Markov Process let us publish counts of paths •  We can convert high-order Markov process to 1st-order Markov process B→C A→B B→D A→D Example of 2-order Markov process •  We can publish counts of 2-length paths
  • 15. Jan. 10 15 Privacy for Continual Data Publishing Extension for High-order Markov Process •  We assumed 1st-order Markov Process 0.9 •  Elements of published histograms means a POI 0.1 A 0.5 B 0.5 •  High-order Markov Process let us publish counts of paths •  We can convert high-order Markov process to 1st-order Markov process B→C A→B Our proposal guarantee privacy B→D for publishing n-gram paths’ counts A→D Example of 2-order Markov process •  We can publish counts of 2-length paths
  • 16. Jan. 10 Privacy for Continual Data Publishing Evaluation •  Set two mining tasks •  Change point detection •  Frequent paths extraction •  Datasets •  Moving people in Tokyo, 1998 provided by People Flow Project3 •  Construct two small datasets: Shibuya and Machida •  Shibuya: lots of people moving, to evaluate in urban area •  Machida: less people moving, to evaluate in sub-urban area [3] http://pflow.csis.u-tokyo.ac.jp/index-j.html 16
  • 17. Jan. 10 Privacy for Continual Data Publishing Number of people (Shibuya) Plain: Original data AdvP: Proposal DP-1: DP (ε=1) DP-100: DP (ε=100) Errors in lesspopulated times DP: Differential privacy Almost same 17
  • 18. Jan. 10 18 Privacy for Continual Data Publishing Change point detection (Shibuya) Change Point Scores •  AdvP (proposal) has errors in rush hours •  But, there are no false positive •  DP-1, DP-100 have many errors •  DP-100 is too weak setting but has errors Errors
  • 19. Jan. 10 Privacy for Continual Data Publishing Number of people (Machida) Almost same Too many noises 19
  • 20. Jan. 10 20 Privacy for Continual Data Publishing Change point detection (Machida) Change point scores •  AdvP (proposal) has errors in rush hours •  DP-1, DP-100 have errors in any time errors
  • 21. Jan. 10 21 Privacy for Continual Data Publishing Frequent paths extraction •  We employ NDCG6 to evaluate accuracies of outputs good Shibuya Machida bad [6] K.Järvelin, J.Kekäläinen, ”IR evaluation methods for retrieving highly relevant documents,” Proc. of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp.41-48, 2000.
  • 22. Jan. 10 22 Privacy for Continual Data Publishing Frequent paths extraction •  We employ NDCG6 to evaluate accuracies of outputs good Shibuya Machida bad •  Outputs by our proposal archives better results than differential privacy in both Shibuya and Machida. •  Our proposal is effective for publishing paths’ counts [6] K.Järvelin, J.Kekäläinen, ”IR evaluation methods for retrieving highly relevant documents,” Proc. of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp.41-48, 2000.
  • 23. Jan. 10 Privacy for Continual Data Publishing Conclusion •  Propose a new privacy definition •  Preserving utilities of outputs as much as possible •  Assuming Markov process on people’s moves •  Employing adversarial privacy framework •  Evaluations with two data mining tasks •  Change point detection and frequent paths extraction •  Our privacy archives better utility than differential privacy •  Future work •  Applying to other mining tasks •  Comparing with other privacy definitions 23

×