SlideShare a Scribd company logo
1 of 23
1
Random Data Perturbation Techniques and Privacy
Preserving Data Mining
Gunjan Gupta
(Authors: H. Kargupta, S. Datta, Q. Wang & K. Sivakumar)
April 26, 2005
2
Privacy & Good Service: Often Conflicting Goals
• Privacy
– Customer: I don’t want you to share my personal information with anyone.
– Business: I don’t want to share my data with a competitor.
• Quantity, Cost & Quality of Service
– Customer: I want you to provide me lower cost of service
– and good quality.
– and at lower cost.
• Paradox: lower cost often comes from being able to use/share sensitive
data that can be used or misused:
– Provide better service by predicting consumer needs better, or sell information
to marketers.
– Optimize load sharing between competing utilities or preempting competition.
– Doctor saving patient by knowing patient history or insurance companies
declining coverage to individuals with preexisting conditions.
3
Can we use privacy sensitive data to optimize cost and
quality of a service without compromising any privacy?
Central Question:
4
Short Answer:
No!
5
Long Answer:
Maybe compromise a small amount of privacy (low cost
increase) to improve quality and cost of service (high cost
savings) substantially.
6
Why anonymous exact records not so secure?
• Example : medical insurance premium estimation based on patient history
– Predictive fields often generic: age, sex, disease history, first two digits of zip
code (not allowed in Germany). no. of kids etc.
– Specifics such as record id (key), name, address omitted.
• This could be easily broken by matching non-secure records with secure
anonymous records:
Susan Calvin, 121 Norwood Cr.
Austin, TX-78753
Hi, I am Susan, and here are pictures
of me, my husband, and my 3
wonderful kids from my 43rd
birthday party!
Female, 43, 3 kids, 78---,married,
anonymous medical record 1
Female, 43, 2 kids, 78---, single
anonymous medical record 2
Yellowpages
Personal website
Anonymous “privacy preserving records”
Susan Calvin, 43, 3 kids, Address,
78733, now labeled med. Records!
Internal Human +
Automated hacker
Broken Exact record
7
Two approaches to Privacy Preserving
• Distributed:
– Suitable for multi-party platforms. Share sub-models.
– Unsupervised: Ensemble Clustering, Privacy Preserving Clustering etc.
– Supervised: Meta-learners, Fourier Spectrum Decision Trees, Collective
Hierarchical Clustering and so on..
– Secure communication based: Secure sum, secure scalar product
• Random Data Perturbation: Our focus
– Perturb data by small amounts to protect privacy of individual records.
– Preserve intrinsic distributions necessary for modeling.
8
Recovering approximately correct anonymous
features also breaks privacy
• Somewhat inexactly recovered anonymous record values might also be sufficient:
Susan Calvin, 121 Norwood Cr.
Austin, TX-78753
Hi, I am Susan, and here are pictures
of me, my husband, and my 3
wonderful kids from my 43rd
birthday party!
yellowpages
Personal website
“Denoised” privacy preserving records
Susan Calvin, 43, 3 kids, Address,
78733, now labeled med. Records!
Internal Human +
Automated hacker
Broken Exact record
Female, 44.5, 3.2 kids, 78---,married,
anonymous medical record 1
Female, 42.2, 2.1 kids, 78---, single,
anonymous medical record 2
9
Anonymous records (with or without) small perturbations not
secure: not a recently noticed phenomena
• 1979, Denning & Denning: The Tracker: A Threat to Statistical Database Security
– Show why anonymous records are not secure.
– Show example of recovering exact salary of a professor from anonymous
records.
– Present a general algorithm for an Individual Tracker.
– A formal probabilistic model and set of conditions that make a dataset support
such a tracker.
• 1984, Traub & Yemin: The Statistical Security of a Statistical Database:
– No free lunch: perturbations cause irrecoverable loss in model accuracy.
– However, the holy grail of random perturbation:
We can try to find a perturbation algorithm that best trades
off between loss of privacy vs. model accuracy.
10
Recovering perturbed distributions: Earlier work
• Reconstructing Original Distribution from Perturbed Ones. Setup:
– N samples U1, U2, U3.. Xn
– N noise values V1, V2, V3.. Vn all taken from a public(known) distribution
V.
– Visible noisy data: W1=U1+V1, W2=U2+V2 . .
– Assumption: Such noise can allow you to recover the distribution of
X1,X2,X3 ..Xn, but not the individual record’s.
• Two well known methods and definitions:
– Agrawal & Srikant:
Interval based: Privacy(X) at Confidence 0.95= X2-X1
– Agrawal & Aggarwal:
Distributional Privacy(X)=2h(x)
X1 X2
f(x) f’(x)
11
Interval Based Method: Agrawal & Srikant in more detail
• N samples U1, U2, U3.. Xn
• N noise values V1, V2, V3.. Vn all taken from a public(known) distribution V.
• W1=U1+V1, W2=U2+V2 . .
• Visible noisy data: W1, W2, W3 ..
Given: noise function fV , using Bayes’ Rule, we can show that the cumulative
posterior distribution function of u in terms of w (visible) and fV , and unknown
desired function fu ,
Differentiating w.r.t. u we get an important recursive definition:
Notation issue (in paper): f‘ simply means approximation of true f, not derivative of f !
12
Interval Based Method: Agrawal & Srikant in more detail
Seed with a uniform distribution for J=0
sum over discrete z intervals instead of
integral for speed
Algorithm in practice:
replaced integration with summation
over i.i.d samples
STEP J+1
STEP J
• Converges to a local minima? Different than uniform initialization
might give a different result. Not explored by authors.
• For large enough samples, hope to get close to true distribution.
• Stop when fU(J+1) – fU(J) becomes small.
13
Interval Based Method: Good Results for a variety of noises
14
Revisiting an Essential Assumption in the Random Perturbation
Assumption: Such noise can allow you to recover the distribution
of X1,X2,X3 ..Xn, but not the individual record’s.
• The Authors in this paper challenge this assumption.
• Claim randomness addition can be mostly visual and not real:
• Many simple forms of random perturbations are “breakable”.
15
Exploit predictable properties of Random data to design a filter
to break the perturbation encryption?
Spiral data Random data
All eigen-values close to 1!
16
Spectral Filtering:
Main Idea: Use eigen-values properties of noise to filter
• U+V data
• Decomposition of eeigen-values
of noise and original data
• Recovered data
17
Decomposing eigen-values: separating data from noise
Let –
U and V be the m x n data and noise matrices
P the perturbed matrix UP= U+V
Covariance matrix of UP = UP
T
UP = (U+V) T
(U+V) = UT
U + VT
U + UT
V + UT
U
Since signal and noise are uncorrelated in random perturbation, for
large no. of observations: VT
U ~ 0 and UT
V ~ 0, therefore
UP
T
UP = UT
U + VT
V
Since the above 3 matrices are correlation matrices, they are symmetric and
positive semi-definite, therefore, we can perform eigen decomposition:
18
With bunch of algebra and theorems from Matrix Perturbation
theory, authors show that in the limit (lots of data)..
Giving us the following algorithm:
1. Find a large no. of eigen values of the perturbed data P.
2. Separate all eigen values inside λmin and λmax and save row indices IV
3. Take the remaining eigen indices to get the “peturbed” but not noise
eigens coming from true data U: save their row indices IU
4. Break perturbed eigenvector matrix QP into AU = QP (IU), AV = QP (IV).
5. Estimate true data as projection :
Wigner’s law: Describes distribution of eigen values for normal random
matrices:
• eigen values for noise component V stick in a thin range given by λmin and
λmax (show example next page) with high probability.
• Allows us to compute λmin and λmax.
Solution!
19
Exploit predictable properties of Random data to design a filter
to break the perturbation encryption?
Spiral data Random data
All eigen-values close to 1!
20
Results: Quality of Eeigen values recovery
Only the real eigen’s
got captured, because
of the nice automatic
thresholding !
21
Results: Comparison with Aggarwal’s reproduction
Agrawal & Srikant (no breaking
of encryption) Agrawal & Srikant (estimated from broken
encryption)
22
Discussion
• Amazing amount of experimental results and comparisons presented by authors in
the Journal version.
• Extension to a situation where perturbing distribution form is known but exact first
, second or higher order statistics not known: discussed but not presented.
• Comparison of performance with other obvious techniques for noise reduction in
signal processing community:
– Moving Averages and Weiner Filtering.
– PCA Based filtering.
• Pros and Cons of the perturbation analysis by authors (and in general):
– Effect of more and more noise: rapid degradation of results.
– Problem in dealing with inherent noise in original data.
– Technique fails when features independent of each other because of
Covariance matrix exploitation: Points to a major improvement possibility in
encryption: perform ICA/PCA and then randomize?
– Results suggest that more complex noise models might be harder to break.
Not clear if this improves privacy-model quality tradeoff?
– eigen decomposition has an inherent metric assumption?
23
A not-so-ominous* application of noise filtering: Nulling
Interferometer on Terrestrial Planet Finder-I
*but maybe not if you believe Hollywood movies such as
Independence Day!
alien ship

More Related Content

Similar to main

Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017MLconf
 
An Evolutionary-based Neural Network for Distinguishing between Genuine and P...
An Evolutionary-based Neural Network for Distinguishing between Genuine and P...An Evolutionary-based Neural Network for Distinguishing between Genuine and P...
An Evolutionary-based Neural Network for Distinguishing between Genuine and P...Md Rakibul Hasan
 
Microarray data noise simulation
Microarray data noise simulationMicroarray data noise simulation
Microarray data noise simulationDespoina Kalfakakou
 
Compressed sensing techniques for sensor data using unsupervised learning
Compressed sensing techniques for sensor data using unsupervised learningCompressed sensing techniques for sensor data using unsupervised learning
Compressed sensing techniques for sensor data using unsupervised learningSong Cui, Ph.D
 
Sara Hooker & Sean McPherson, Delta Analytics, at MLconf Seattle 2017
Sara Hooker & Sean McPherson, Delta Analytics, at MLconf Seattle 2017Sara Hooker & Sean McPherson, Delta Analytics, at MLconf Seattle 2017
Sara Hooker & Sean McPherson, Delta Analytics, at MLconf Seattle 2017MLconf
 
CBM Fault Detection by Carl Byington
CBM Fault Detection by Carl ByingtonCBM Fault Detection by Carl Byington
CBM Fault Detection by Carl ByingtonCarl Byington
 
A Study on Privacy Level in Publishing Data of Smart Tap Network
A Study on Privacy Level in Publishing Data of Smart Tap NetworkA Study on Privacy Level in Publishing Data of Smart Tap Network
A Study on Privacy Level in Publishing Data of Smart Tap NetworkHa Phuong
 
Project presentation - Capstone
Project presentation  - CapstoneProject presentation  - Capstone
Project presentation - CapstoneSkandha Ch
 
Machine Learning Notes for beginners ,Step by step
Machine Learning Notes for beginners ,Step by stepMachine Learning Notes for beginners ,Step by step
Machine Learning Notes for beginners ,Step by stepSanjanaSaxena17
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”Dr.(Mrs).Gethsiyal Augasta
 
Statistical Preliminaries
Statistical PreliminariesStatistical Preliminaries
Statistical PreliminariesR A Akerkar
 
An Efficient Approach for Outlier Detection in Wireless Sensor Network
An Efficient Approach for Outlier Detection in Wireless Sensor NetworkAn Efficient Approach for Outlier Detection in Wireless Sensor Network
An Efficient Approach for Outlier Detection in Wireless Sensor NetworkIOSR Journals
 
Introduction to Dimension Reduction with PCA
Introduction to Dimension Reduction with PCAIntroduction to Dimension Reduction with PCA
Introduction to Dimension Reduction with PCADilum Bandara
 

Similar to main (20)

5
55
5
 
7
77
7
 
7
77
7
 
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
 
An Evolutionary-based Neural Network for Distinguishing between Genuine and P...
An Evolutionary-based Neural Network for Distinguishing between Genuine and P...An Evolutionary-based Neural Network for Distinguishing between Genuine and P...
An Evolutionary-based Neural Network for Distinguishing between Genuine and P...
 
Microarray data noise simulation
Microarray data noise simulationMicroarray data noise simulation
Microarray data noise simulation
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Compressed sensing techniques for sensor data using unsupervised learning
Compressed sensing techniques for sensor data using unsupervised learningCompressed sensing techniques for sensor data using unsupervised learning
Compressed sensing techniques for sensor data using unsupervised learning
 
Sara Hooker & Sean McPherson, Delta Analytics, at MLconf Seattle 2017
Sara Hooker & Sean McPherson, Delta Analytics, at MLconf Seattle 2017Sara Hooker & Sean McPherson, Delta Analytics, at MLconf Seattle 2017
Sara Hooker & Sean McPherson, Delta Analytics, at MLconf Seattle 2017
 
CBM Fault Detection by Carl Byington
CBM Fault Detection by Carl ByingtonCBM Fault Detection by Carl Byington
CBM Fault Detection by Carl Byington
 
A Study on Privacy Level in Publishing Data of Smart Tap Network
A Study on Privacy Level in Publishing Data of Smart Tap NetworkA Study on Privacy Level in Publishing Data of Smart Tap Network
A Study on Privacy Level in Publishing Data of Smart Tap Network
 
Core Training Presentations- 3 Estimating an Ag Database using CE Methods
Core Training Presentations- 3 Estimating an Ag Database using CE MethodsCore Training Presentations- 3 Estimating an Ag Database using CE Methods
Core Training Presentations- 3 Estimating an Ag Database using CE Methods
 
Project presentation - Capstone
Project presentation  - CapstoneProject presentation  - Capstone
Project presentation - Capstone
 
PCA Final.pptx
PCA Final.pptxPCA Final.pptx
PCA Final.pptx
 
Machine Learning Notes for beginners ,Step by step
Machine Learning Notes for beginners ,Step by stepMachine Learning Notes for beginners ,Step by step
Machine Learning Notes for beginners ,Step by step
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”
 
Statistical Preliminaries
Statistical PreliminariesStatistical Preliminaries
Statistical Preliminaries
 
An Efficient Approach for Outlier Detection in Wireless Sensor Network
An Efficient Approach for Outlier Detection in Wireless Sensor NetworkAn Efficient Approach for Outlier Detection in Wireless Sensor Network
An Efficient Approach for Outlier Detection in Wireless Sensor Network
 
Seminar nov2017
Seminar nov2017Seminar nov2017
Seminar nov2017
 
Introduction to Dimension Reduction with PCA
Introduction to Dimension Reduction with PCAIntroduction to Dimension Reduction with PCA
Introduction to Dimension Reduction with PCA
 

Recently uploaded

Beautiful 😋 Call girls in Lahore 03210033448
Beautiful 😋 Call girls in Lahore 03210033448Beautiful 😋 Call girls in Lahore 03210033448
Beautiful 😋 Call girls in Lahore 03210033448ont65320
 
↑Top Model (Kolkata) Call Girls Howrah ⟟ 8250192130 ⟟ High Class Call Girl In...
↑Top Model (Kolkata) Call Girls Howrah ⟟ 8250192130 ⟟ High Class Call Girl In...↑Top Model (Kolkata) Call Girls Howrah ⟟ 8250192130 ⟟ High Class Call Girl In...
↑Top Model (Kolkata) Call Girls Howrah ⟟ 8250192130 ⟟ High Class Call Girl In...noor ahmed
 
Independent Garulia Escorts ✔ 9332606886✔ Full Night With Room Online Booking...
Independent Garulia Escorts ✔ 9332606886✔ Full Night With Room Online Booking...Independent Garulia Escorts ✔ 9332606886✔ Full Night With Room Online Booking...
Independent Garulia Escorts ✔ 9332606886✔ Full Night With Room Online Booking...Riya Pathan
 
Sonagachi ( Call Girls ) Kolkata ✔ 6297143586 ✔ Hot Model With Sexy Bhabi Rea...
Sonagachi ( Call Girls ) Kolkata ✔ 6297143586 ✔ Hot Model With Sexy Bhabi Rea...Sonagachi ( Call Girls ) Kolkata ✔ 6297143586 ✔ Hot Model With Sexy Bhabi Rea...
Sonagachi ( Call Girls ) Kolkata ✔ 6297143586 ✔ Hot Model With Sexy Bhabi Rea...rahim quresi
 
College Call Girls Pune 8617697112 Short 1500 Night 6000 Best call girls Service
College Call Girls Pune 8617697112 Short 1500 Night 6000 Best call girls ServiceCollege Call Girls Pune 8617697112 Short 1500 Night 6000 Best call girls Service
College Call Girls Pune 8617697112 Short 1500 Night 6000 Best call girls ServiceNitya salvi
 
Verified Trusted Call Girls Tambaram Chennai ✔✔7427069034 Independent Chenna...
Verified Trusted Call Girls Tambaram Chennai ✔✔7427069034  Independent Chenna...Verified Trusted Call Girls Tambaram Chennai ✔✔7427069034  Independent Chenna...
Verified Trusted Call Girls Tambaram Chennai ✔✔7427069034 Independent Chenna... Shivani Pandey
 
Science City Kolkata ( Call Girls ) Kolkata ✔ 6297143586 ✔ Hot Model With Sex...
Science City Kolkata ( Call Girls ) Kolkata ✔ 6297143586 ✔ Hot Model With Sex...Science City Kolkata ( Call Girls ) Kolkata ✔ 6297143586 ✔ Hot Model With Sex...
Science City Kolkata ( Call Girls ) Kolkata ✔ 6297143586 ✔ Hot Model With Sex...rahim quresi
 
Model Call Girls In Ariyalur WhatsApp Booking 7427069034 call girl service 24...
Model Call Girls In Ariyalur WhatsApp Booking 7427069034 call girl service 24...Model Call Girls In Ariyalur WhatsApp Booking 7427069034 call girl service 24...
Model Call Girls In Ariyalur WhatsApp Booking 7427069034 call girl service 24... Shivani Pandey
 
𓀤Call On 6297143586 𓀤 Ultadanga Call Girls In All Kolkata 24/7 Provide Call W...
𓀤Call On 6297143586 𓀤 Ultadanga Call Girls In All Kolkata 24/7 Provide Call W...𓀤Call On 6297143586 𓀤 Ultadanga Call Girls In All Kolkata 24/7 Provide Call W...
𓀤Call On 6297143586 𓀤 Ultadanga Call Girls In All Kolkata 24/7 Provide Call W...rahim quresi
 
Model Call Girls In Velappanchavadi WhatsApp Booking 7427069034 call girl ser...
Model Call Girls In Velappanchavadi WhatsApp Booking 7427069034 call girl ser...Model Call Girls In Velappanchavadi WhatsApp Booking 7427069034 call girl ser...
Model Call Girls In Velappanchavadi WhatsApp Booking 7427069034 call girl ser... Shivani Pandey
 
Hotel And Home Service Available Kolkata Call Girls Howrah ✔ 6297143586 ✔Call...
Hotel And Home Service Available Kolkata Call Girls Howrah ✔ 6297143586 ✔Call...Hotel And Home Service Available Kolkata Call Girls Howrah ✔ 6297143586 ✔Call...
Hotel And Home Service Available Kolkata Call Girls Howrah ✔ 6297143586 ✔Call...ritikasharma
 
Almora call girls 📞 8617697112 At Low Cost Cash Payment Booking
Almora call girls 📞 8617697112 At Low Cost Cash Payment BookingAlmora call girls 📞 8617697112 At Low Cost Cash Payment Booking
Almora call girls 📞 8617697112 At Low Cost Cash Payment BookingNitya salvi
 
5* Hotels Call Girls In Goa {{07028418221}} Call Girls In North Goa Escort Se...
5* Hotels Call Girls In Goa {{07028418221}} Call Girls In North Goa Escort Se...5* Hotels Call Girls In Goa {{07028418221}} Call Girls In North Goa Escort Se...
5* Hotels Call Girls In Goa {{07028418221}} Call Girls In North Goa Escort Se...Apsara Of India
 
❤Personal Whatsapp Number Keylong Call Girls 8617697112 💦✅.
❤Personal Whatsapp Number Keylong Call Girls 8617697112 💦✅.❤Personal Whatsapp Number Keylong Call Girls 8617697112 💦✅.
❤Personal Whatsapp Number Keylong Call Girls 8617697112 💦✅.Nitya salvi
 
Top Rated Kolkata Call Girls Khardah ⟟ 6297143586 ⟟ Call Me For Genuine Sex S...
Top Rated Kolkata Call Girls Khardah ⟟ 6297143586 ⟟ Call Me For Genuine Sex S...Top Rated Kolkata Call Girls Khardah ⟟ 6297143586 ⟟ Call Me For Genuine Sex S...
Top Rated Kolkata Call Girls Khardah ⟟ 6297143586 ⟟ Call Me For Genuine Sex S...ritikasharma
 
2k Shot Call girls Laxmi Nagar Delhi 9205541914
2k Shot Call girls Laxmi Nagar Delhi 92055419142k Shot Call girls Laxmi Nagar Delhi 9205541914
2k Shot Call girls Laxmi Nagar Delhi 9205541914Delhi Call girls
 

Recently uploaded (20)

Beautiful 😋 Call girls in Lahore 03210033448
Beautiful 😋 Call girls in Lahore 03210033448Beautiful 😋 Call girls in Lahore 03210033448
Beautiful 😋 Call girls in Lahore 03210033448
 
↑Top Model (Kolkata) Call Girls Howrah ⟟ 8250192130 ⟟ High Class Call Girl In...
↑Top Model (Kolkata) Call Girls Howrah ⟟ 8250192130 ⟟ High Class Call Girl In...↑Top Model (Kolkata) Call Girls Howrah ⟟ 8250192130 ⟟ High Class Call Girl In...
↑Top Model (Kolkata) Call Girls Howrah ⟟ 8250192130 ⟟ High Class Call Girl In...
 
Independent Garulia Escorts ✔ 9332606886✔ Full Night With Room Online Booking...
Independent Garulia Escorts ✔ 9332606886✔ Full Night With Room Online Booking...Independent Garulia Escorts ✔ 9332606886✔ Full Night With Room Online Booking...
Independent Garulia Escorts ✔ 9332606886✔ Full Night With Room Online Booking...
 
Desi Bhabhi Call Girls In Goa 💃 730 02 72 001💃desi Bhabhi Escort Goa
Desi Bhabhi Call Girls  In Goa  💃 730 02 72 001💃desi Bhabhi Escort GoaDesi Bhabhi Call Girls  In Goa  💃 730 02 72 001💃desi Bhabhi Escort Goa
Desi Bhabhi Call Girls In Goa 💃 730 02 72 001💃desi Bhabhi Escort Goa
 
Goa Call "Girls Service 9316020077 Call "Girls in Goa
Goa Call "Girls  Service   9316020077 Call "Girls in GoaGoa Call "Girls  Service   9316020077 Call "Girls in Goa
Goa Call "Girls Service 9316020077 Call "Girls in Goa
 
Sonagachi ( Call Girls ) Kolkata ✔ 6297143586 ✔ Hot Model With Sexy Bhabi Rea...
Sonagachi ( Call Girls ) Kolkata ✔ 6297143586 ✔ Hot Model With Sexy Bhabi Rea...Sonagachi ( Call Girls ) Kolkata ✔ 6297143586 ✔ Hot Model With Sexy Bhabi Rea...
Sonagachi ( Call Girls ) Kolkata ✔ 6297143586 ✔ Hot Model With Sexy Bhabi Rea...
 
College Call Girls Pune 8617697112 Short 1500 Night 6000 Best call girls Service
College Call Girls Pune 8617697112 Short 1500 Night 6000 Best call girls ServiceCollege Call Girls Pune 8617697112 Short 1500 Night 6000 Best call girls Service
College Call Girls Pune 8617697112 Short 1500 Night 6000 Best call girls Service
 
Verified Trusted Call Girls Tambaram Chennai ✔✔7427069034 Independent Chenna...
Verified Trusted Call Girls Tambaram Chennai ✔✔7427069034  Independent Chenna...Verified Trusted Call Girls Tambaram Chennai ✔✔7427069034  Independent Chenna...
Verified Trusted Call Girls Tambaram Chennai ✔✔7427069034 Independent Chenna...
 
Science City Kolkata ( Call Girls ) Kolkata ✔ 6297143586 ✔ Hot Model With Sex...
Science City Kolkata ( Call Girls ) Kolkata ✔ 6297143586 ✔ Hot Model With Sex...Science City Kolkata ( Call Girls ) Kolkata ✔ 6297143586 ✔ Hot Model With Sex...
Science City Kolkata ( Call Girls ) Kolkata ✔ 6297143586 ✔ Hot Model With Sex...
 
Model Call Girls In Ariyalur WhatsApp Booking 7427069034 call girl service 24...
Model Call Girls In Ariyalur WhatsApp Booking 7427069034 call girl service 24...Model Call Girls In Ariyalur WhatsApp Booking 7427069034 call girl service 24...
Model Call Girls In Ariyalur WhatsApp Booking 7427069034 call girl service 24...
 
𓀤Call On 6297143586 𓀤 Ultadanga Call Girls In All Kolkata 24/7 Provide Call W...
𓀤Call On 6297143586 𓀤 Ultadanga Call Girls In All Kolkata 24/7 Provide Call W...𓀤Call On 6297143586 𓀤 Ultadanga Call Girls In All Kolkata 24/7 Provide Call W...
𓀤Call On 6297143586 𓀤 Ultadanga Call Girls In All Kolkata 24/7 Provide Call W...
 
Model Call Girls In Velappanchavadi WhatsApp Booking 7427069034 call girl ser...
Model Call Girls In Velappanchavadi WhatsApp Booking 7427069034 call girl ser...Model Call Girls In Velappanchavadi WhatsApp Booking 7427069034 call girl ser...
Model Call Girls In Velappanchavadi WhatsApp Booking 7427069034 call girl ser...
 
CHEAP Call Girls in Malviya Nagar, (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in  Malviya Nagar, (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in  Malviya Nagar, (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Malviya Nagar, (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Hotel And Home Service Available Kolkata Call Girls Howrah ✔ 6297143586 ✔Call...
Hotel And Home Service Available Kolkata Call Girls Howrah ✔ 6297143586 ✔Call...Hotel And Home Service Available Kolkata Call Girls Howrah ✔ 6297143586 ✔Call...
Hotel And Home Service Available Kolkata Call Girls Howrah ✔ 6297143586 ✔Call...
 
Russian ℂall gIRLS In Goa 9316020077 ℂall gIRLS Service In Goa
Russian ℂall gIRLS In Goa 9316020077  ℂall gIRLS Service  In GoaRussian ℂall gIRLS In Goa 9316020077  ℂall gIRLS Service  In Goa
Russian ℂall gIRLS In Goa 9316020077 ℂall gIRLS Service In Goa
 
Almora call girls 📞 8617697112 At Low Cost Cash Payment Booking
Almora call girls 📞 8617697112 At Low Cost Cash Payment BookingAlmora call girls 📞 8617697112 At Low Cost Cash Payment Booking
Almora call girls 📞 8617697112 At Low Cost Cash Payment Booking
 
5* Hotels Call Girls In Goa {{07028418221}} Call Girls In North Goa Escort Se...
5* Hotels Call Girls In Goa {{07028418221}} Call Girls In North Goa Escort Se...5* Hotels Call Girls In Goa {{07028418221}} Call Girls In North Goa Escort Se...
5* Hotels Call Girls In Goa {{07028418221}} Call Girls In North Goa Escort Se...
 
❤Personal Whatsapp Number Keylong Call Girls 8617697112 💦✅.
❤Personal Whatsapp Number Keylong Call Girls 8617697112 💦✅.❤Personal Whatsapp Number Keylong Call Girls 8617697112 💦✅.
❤Personal Whatsapp Number Keylong Call Girls 8617697112 💦✅.
 
Top Rated Kolkata Call Girls Khardah ⟟ 6297143586 ⟟ Call Me For Genuine Sex S...
Top Rated Kolkata Call Girls Khardah ⟟ 6297143586 ⟟ Call Me For Genuine Sex S...Top Rated Kolkata Call Girls Khardah ⟟ 6297143586 ⟟ Call Me For Genuine Sex S...
Top Rated Kolkata Call Girls Khardah ⟟ 6297143586 ⟟ Call Me For Genuine Sex S...
 
2k Shot Call girls Laxmi Nagar Delhi 9205541914
2k Shot Call girls Laxmi Nagar Delhi 92055419142k Shot Call girls Laxmi Nagar Delhi 9205541914
2k Shot Call girls Laxmi Nagar Delhi 9205541914
 

main

  • 1. 1 Random Data Perturbation Techniques and Privacy Preserving Data Mining Gunjan Gupta (Authors: H. Kargupta, S. Datta, Q. Wang & K. Sivakumar) April 26, 2005
  • 2. 2 Privacy & Good Service: Often Conflicting Goals • Privacy – Customer: I don’t want you to share my personal information with anyone. – Business: I don’t want to share my data with a competitor. • Quantity, Cost & Quality of Service – Customer: I want you to provide me lower cost of service – and good quality. – and at lower cost. • Paradox: lower cost often comes from being able to use/share sensitive data that can be used or misused: – Provide better service by predicting consumer needs better, or sell information to marketers. – Optimize load sharing between competing utilities or preempting competition. – Doctor saving patient by knowing patient history or insurance companies declining coverage to individuals with preexisting conditions.
  • 3. 3 Can we use privacy sensitive data to optimize cost and quality of a service without compromising any privacy? Central Question:
  • 5. 5 Long Answer: Maybe compromise a small amount of privacy (low cost increase) to improve quality and cost of service (high cost savings) substantially.
  • 6. 6 Why anonymous exact records not so secure? • Example : medical insurance premium estimation based on patient history – Predictive fields often generic: age, sex, disease history, first two digits of zip code (not allowed in Germany). no. of kids etc. – Specifics such as record id (key), name, address omitted. • This could be easily broken by matching non-secure records with secure anonymous records: Susan Calvin, 121 Norwood Cr. Austin, TX-78753 Hi, I am Susan, and here are pictures of me, my husband, and my 3 wonderful kids from my 43rd birthday party! Female, 43, 3 kids, 78---,married, anonymous medical record 1 Female, 43, 2 kids, 78---, single anonymous medical record 2 Yellowpages Personal website Anonymous “privacy preserving records” Susan Calvin, 43, 3 kids, Address, 78733, now labeled med. Records! Internal Human + Automated hacker Broken Exact record
  • 7. 7 Two approaches to Privacy Preserving • Distributed: – Suitable for multi-party platforms. Share sub-models. – Unsupervised: Ensemble Clustering, Privacy Preserving Clustering etc. – Supervised: Meta-learners, Fourier Spectrum Decision Trees, Collective Hierarchical Clustering and so on.. – Secure communication based: Secure sum, secure scalar product • Random Data Perturbation: Our focus – Perturb data by small amounts to protect privacy of individual records. – Preserve intrinsic distributions necessary for modeling.
  • 8. 8 Recovering approximately correct anonymous features also breaks privacy • Somewhat inexactly recovered anonymous record values might also be sufficient: Susan Calvin, 121 Norwood Cr. Austin, TX-78753 Hi, I am Susan, and here are pictures of me, my husband, and my 3 wonderful kids from my 43rd birthday party! yellowpages Personal website “Denoised” privacy preserving records Susan Calvin, 43, 3 kids, Address, 78733, now labeled med. Records! Internal Human + Automated hacker Broken Exact record Female, 44.5, 3.2 kids, 78---,married, anonymous medical record 1 Female, 42.2, 2.1 kids, 78---, single, anonymous medical record 2
  • 9. 9 Anonymous records (with or without) small perturbations not secure: not a recently noticed phenomena • 1979, Denning & Denning: The Tracker: A Threat to Statistical Database Security – Show why anonymous records are not secure. – Show example of recovering exact salary of a professor from anonymous records. – Present a general algorithm for an Individual Tracker. – A formal probabilistic model and set of conditions that make a dataset support such a tracker. • 1984, Traub & Yemin: The Statistical Security of a Statistical Database: – No free lunch: perturbations cause irrecoverable loss in model accuracy. – However, the holy grail of random perturbation: We can try to find a perturbation algorithm that best trades off between loss of privacy vs. model accuracy.
  • 10. 10 Recovering perturbed distributions: Earlier work • Reconstructing Original Distribution from Perturbed Ones. Setup: – N samples U1, U2, U3.. Xn – N noise values V1, V2, V3.. Vn all taken from a public(known) distribution V. – Visible noisy data: W1=U1+V1, W2=U2+V2 . . – Assumption: Such noise can allow you to recover the distribution of X1,X2,X3 ..Xn, but not the individual record’s. • Two well known methods and definitions: – Agrawal & Srikant: Interval based: Privacy(X) at Confidence 0.95= X2-X1 – Agrawal & Aggarwal: Distributional Privacy(X)=2h(x) X1 X2 f(x) f’(x)
  • 11. 11 Interval Based Method: Agrawal & Srikant in more detail • N samples U1, U2, U3.. Xn • N noise values V1, V2, V3.. Vn all taken from a public(known) distribution V. • W1=U1+V1, W2=U2+V2 . . • Visible noisy data: W1, W2, W3 .. Given: noise function fV , using Bayes’ Rule, we can show that the cumulative posterior distribution function of u in terms of w (visible) and fV , and unknown desired function fu , Differentiating w.r.t. u we get an important recursive definition: Notation issue (in paper): f‘ simply means approximation of true f, not derivative of f !
  • 12. 12 Interval Based Method: Agrawal & Srikant in more detail Seed with a uniform distribution for J=0 sum over discrete z intervals instead of integral for speed Algorithm in practice: replaced integration with summation over i.i.d samples STEP J+1 STEP J • Converges to a local minima? Different than uniform initialization might give a different result. Not explored by authors. • For large enough samples, hope to get close to true distribution. • Stop when fU(J+1) – fU(J) becomes small.
  • 13. 13 Interval Based Method: Good Results for a variety of noises
  • 14. 14 Revisiting an Essential Assumption in the Random Perturbation Assumption: Such noise can allow you to recover the distribution of X1,X2,X3 ..Xn, but not the individual record’s. • The Authors in this paper challenge this assumption. • Claim randomness addition can be mostly visual and not real: • Many simple forms of random perturbations are “breakable”.
  • 15. 15 Exploit predictable properties of Random data to design a filter to break the perturbation encryption? Spiral data Random data All eigen-values close to 1!
  • 16. 16 Spectral Filtering: Main Idea: Use eigen-values properties of noise to filter • U+V data • Decomposition of eeigen-values of noise and original data • Recovered data
  • 17. 17 Decomposing eigen-values: separating data from noise Let – U and V be the m x n data and noise matrices P the perturbed matrix UP= U+V Covariance matrix of UP = UP T UP = (U+V) T (U+V) = UT U + VT U + UT V + UT U Since signal and noise are uncorrelated in random perturbation, for large no. of observations: VT U ~ 0 and UT V ~ 0, therefore UP T UP = UT U + VT V Since the above 3 matrices are correlation matrices, they are symmetric and positive semi-definite, therefore, we can perform eigen decomposition:
  • 18. 18 With bunch of algebra and theorems from Matrix Perturbation theory, authors show that in the limit (lots of data).. Giving us the following algorithm: 1. Find a large no. of eigen values of the perturbed data P. 2. Separate all eigen values inside λmin and λmax and save row indices IV 3. Take the remaining eigen indices to get the “peturbed” but not noise eigens coming from true data U: save their row indices IU 4. Break perturbed eigenvector matrix QP into AU = QP (IU), AV = QP (IV). 5. Estimate true data as projection : Wigner’s law: Describes distribution of eigen values for normal random matrices: • eigen values for noise component V stick in a thin range given by λmin and λmax (show example next page) with high probability. • Allows us to compute λmin and λmax. Solution!
  • 19. 19 Exploit predictable properties of Random data to design a filter to break the perturbation encryption? Spiral data Random data All eigen-values close to 1!
  • 20. 20 Results: Quality of Eeigen values recovery Only the real eigen’s got captured, because of the nice automatic thresholding !
  • 21. 21 Results: Comparison with Aggarwal’s reproduction Agrawal & Srikant (no breaking of encryption) Agrawal & Srikant (estimated from broken encryption)
  • 22. 22 Discussion • Amazing amount of experimental results and comparisons presented by authors in the Journal version. • Extension to a situation where perturbing distribution form is known but exact first , second or higher order statistics not known: discussed but not presented. • Comparison of performance with other obvious techniques for noise reduction in signal processing community: – Moving Averages and Weiner Filtering. – PCA Based filtering. • Pros and Cons of the perturbation analysis by authors (and in general): – Effect of more and more noise: rapid degradation of results. – Problem in dealing with inherent noise in original data. – Technique fails when features independent of each other because of Covariance matrix exploitation: Points to a major improvement possibility in encryption: perform ICA/PCA and then randomize? – Results suggest that more complex noise models might be harder to break. Not clear if this improves privacy-model quality tradeoff? – eigen decomposition has an inherent metric assumption?
  • 23. 23 A not-so-ominous* application of noise filtering: Nulling Interferometer on Terrestrial Planet Finder-I *but maybe not if you believe Hollywood movies such as Independence Day! alien ship