SlideShare a Scribd company logo
1 of 23
Random Data Perturbation Techniques and Privacy
Preserving Data Mining
(Authors: H. Kargupta, S. Datta, Q. Wang & K. Sivakumar)

April 26, 2005
Gunjan Gupta

1
Privacy & Good Service: Often Conflicting Goals
•

Privacy
– Customer: I don’t want you to share my personal information with anyone.
– Business: I don’t want to share my data with a competitor.

•

Quantity, Cost & Quality of Service
– Customer: I want you to provide me lower cost of service
– and good quality.
– and at lower cost.

•

Paradox: lower cost often comes from being able to use/share sensitive
data that can be used or misused:
– Provide better service by predicting consumer needs better, or sell information
to marketers.
– Optimize load sharing between competing utilities or preempting competition.
– Doctor saving patient by knowing patient history or insurance companies
declining coverage to individuals with preexisting conditions.

2
Central Question:
Can we use privacy sensitive data to optimize cost and
quality of a service without compromising any privacy?

3
Short Answer:
No!

4
Long Answer:
Maybe compromise a small amount of privacy (low cost
increase) to improve quality and cost of service (high cost
savings) substantially.

5
Why anonymous exact records not so secure?
•

Example : medical insurance premium estimation based on patient history
– Predictive fields often generic: age, sex, disease history, first two digits of zip
code (not allowed in Germany). no. of kids etc.
– Specifics such as record id (key), name, address omitted.
• This could be easily broken by matching non-secure records with secure
anonymous records:
Anonymous “privacy preserving records”
Yellowpages
Female, 43, 3 kids, 78---,married,
anonymous medical record 1
Female, 43, 2 kids, 78---, single
anonymous medical record 2

Internal Human +
Automated hacker

Broken Exact record

Susan Calvin, 121 Norwood Cr.
Austin, TX-78753

Personal website
Hi, I am Susan, and here are pictures
of me, my husband, and my 3
wonderful kids from my 43rd
birthday party!

Susan Calvin, 43, 3 kids, Address,
78733, now labeled med. Records!

6
Two approaches to Privacy Preserving
•

Distributed:
– Suitable for multi-party platforms. Share sub-models.
– Unsupervised: Ensemble Clustering, Privacy Preserving Clustering etc.
– Supervised: Meta-learners, Fourier Spectrum Decision Trees, Collective
Hierarchical Clustering and so on..
– Secure communication based: Secure sum, secure scalar product

•

Random Data Perturbation: Our focus
– Perturb data by small amounts to protect privacy of individual records.
– Preserve intrinsic distributions necessary for modeling.

7
Recovering approximately correct anonymous
features also breaks privacy
•

Somewhat inexactly recovered anonymous record values might also be sufficient:

“Denoised” privacy preserving records
Female, 44.5, 3.2 kids, 78---,married,
anonymous medical record 1
Female, 42.2, 2.1 kids, 78---, single,
anonymous medical record 2

Internal Human +
Automated hacker

yellowpages
Susan Calvin, 121 Norwood Cr.
Austin, TX-78753

Personal website
Hi, I am Susan, and here are pictures
of me, my husband, and my 3
wonderful kids from my 43rd
birthday party!

Susan Calvin, 43, 3 kids, Address,
Broken Exact record 78733, now labeled med. Records!

8
Anonymous records (with or without) small perturbations not
secure: not a recently noticed phenomena
•

1979, Denning & Denning: The Tracker: A Threat to Statistical Database Security
– Show why anonymous records are not secure.
– Show example of recovering exact salary of a professor from anonymous
records.
– Present a general algorithm for an Individual Tracker.
– A formal probabilistic model and set of conditions that make a dataset support
such a tracker.

•

1984, Traub & Yemin: The Statistical Security of a Statistical Database:
– No free lunch: perturbations cause irrecoverable loss in model accuracy.
– However, the holy grail of random perturbation:
We can try to find a perturbation algorithm that best trades
off between loss of privacy vs. model accuracy.

9
Recovering perturbed distributions: Earlier work
•

Reconstructing Original Distribution from Perturbed Ones. Setup:
–
–
–
–

•

N samples U1, U2, U3.. Xn
N noise values V1, V2, V3.. Vn all taken from a public(known) distribution
V.
Visible noisy data: W1=U1+V1, W2=U2+V2 . .
Assumption: Such noise can allow you to recover the distribution of
X1,X2,X3 ..Xn, but not the individual record’s.

Two well known methods and definitions:
–
Agrawal & Srikant:
Interval based: Privacy(X) at Confidence 0.95= X2-X1
–

X1

X2

Agrawal & Aggarwal:
Distributional Privacy(X)=2h(x)

f(x)

f’(x)

10
Interval Based Method: Agrawal & Srikant in more detail
• N samples U1, U2, U3.. Xn
• N noise values V1, V2, V3.. Vn all taken from a public(known) distribution V.
• W1=U1+V1, W2=U2+V2 . .
• Visible noisy data: W1, W2, W3 ..
Given: noise function fV , using Bayes’ Rule, we can show that the cumulative
posterior distribution function of u in terms of w (visible) and fV , and unknown
desired function fu ,

Differentiating w.r.t. u we get an important recursive definition:

Notation issue (in paper): f‘ simply means approximation of true f, not derivative of f !

11
Interval Based Method: Agrawal & Srikant in more detail
Algorithm in practice:
Seed with a uniform distribution for J=0
STEP J
STEP J+1
replaced integration with summation
over i.i.d samples
sum over discrete z intervals instead of
integral for speed
• Converges to a local minima? Different than uniform initialization
might give a different result. Not explored by authors.
• For large enough samples, hope to get close to true distribution.
• Stop when fU(J+1) – fU(J) becomes small.

12
Interval Based Method: Good Results for a variety of noises

13
Revisiting an Essential Assumption in the Random Perturbation
Assumption: Such noise can allow you to recover the distribution
of X1,X2,X3 ..Xn, but not the individual record’s.
•

The Authors in this paper challenge this assumption.

•

Claim randomness addition can be mostly visual and not real:

•

Many simple forms of random perturbations are “breakable”.

14
Exploit predictable properties of Random data to design a filter
to break the perturbation encryption?

All eigen-values close to 1!
Spiral data

Random data

15
Spectral Filtering:
Main Idea: Use eigen-values properties of noise to filter
•

U+V data

•

Decomposition of eeigen-values
of noise and original data

•

Recovered data

16
Decomposing eigen-values: separating data from noise
Let –
U and V be the m x n data and noise matrices
P the perturbed matrix UP= U+V
Covariance matrix of UP = UP T UP = (U+V) T (U+V) = UTU + VTU + UTV + UTU
Since signal and noise are uncorrelated in random perturbation, for
large no. of observations: VTU ~ 0 and UTV ~ 0, therefore
UP T UP = UTU + VTV
Since the above 3 matrices are correlation matrices, they are symmetric and
positive semi-definite, therefore, we can perform eigen decomposition:

17
With bunch of algebra and theorems from Matrix Perturbation
theory, authors show that in the limit (lots of data)..

Wigner’s law: Describes distribution of eigen values for normal random
matrices:
• eigen values for noise component V stick in a thin range given by λmin and
λmax (show example next page) with high probability.
• Allows us to compute λmin and λmax.

Solution!

Giving us the following algorithm:
1.

Find a large no. of eigen values of the perturbed data P.

2.

Separate all eigen values inside λmin and λmax and save row indices IV

3.

Take the remaining eigen indices to get the “peturbed” but not noise
eigens coming from true data U: save their row indices I U

4.

Break perturbed eigenvector matrix QP into AU = QP (IU), AV = QP (IV).

5.

Estimate true data as projection :

18
Exploit predictable properties of Random data to design a filter
to break the perturbation encryption?

All eigen-values close to 1!
Spiral data

Random data

19
Results: Quality of Eeigen values recovery

Only the real eigen’s
got captured, because
of the nice automatic
thresholding !

20
Results: Comparison with Aggarwal’s reproduction

Agrawal & Srikant (no breaking
of encryption)

Agrawal & Srikant (estimated from broken
encryption)

21
Discussion
•
•
•

•

Amazing amount of experimental results and comparisons presented by authors in
the Journal version.
Extension to a situation where perturbing distribution form is known but exact first
, second or higher order statistics not known: discussed but not presented.
Comparison of performance with other obvious techniques for noise reduction in
signal processing community:
– Moving Averages and Weiner Filtering.
– PCA Based filtering.
Pros and Cons of the perturbation analysis by authors (and in general):
– Effect of more and more noise: rapid degradation of results.
– Problem in dealing with inherent noise in original data.
– Technique fails when features independent of each other because of
Covariance matrix exploitation: Points to a major improvement possibility in
encryption: perform ICA/PCA and then randomize?
– Results suggest that more complex noise models might be harder to break.
Not clear if this improves privacy-model quality tradeoff?
– eigen decomposition has an inherent metric assumption?

22
A not-so-ominous* application of noise filtering: Nulling
Interferometer on Terrestrial Planet Finder-I

alien ship

*but maybe not if you believe Hollywood movies such as
Independence Day!

23

More Related Content

What's hot

Choice of unit practice problems
Choice of unit practice problemsChoice of unit practice problems
Choice of unit practice problemsjulienorman80065
 
K mean-clustering
K mean-clusteringK mean-clustering
K mean-clusteringPVP College
 
An enhanced fuzzy rough set based clustering algorithm for categorical data
An enhanced fuzzy rough set based clustering algorithm for categorical dataAn enhanced fuzzy rough set based clustering algorithm for categorical data
An enhanced fuzzy rough set based clustering algorithm for categorical dataeSAT Journals
 
Data Science Job Required Skill Analysis
Data Science Job Required Skill AnalysisData Science Job Required Skill Analysis
Data Science Job Required Skill AnalysisHarsh Kevadia
 
Parallel kmeans clustering in Erlang
Parallel kmeans clustering in ErlangParallel kmeans clustering in Erlang
Parallel kmeans clustering in ErlangChinmay Patel
 
coppin chapter 10e.ppt
coppin chapter 10e.pptcoppin chapter 10e.ppt
coppin chapter 10e.pptbutest
 
Business Statistics A Decision Making Approach 8th Edition Groebner Solutions...
Business Statistics A Decision Making Approach 8th Edition Groebner Solutions...Business Statistics A Decision Making Approach 8th Edition Groebner Solutions...
Business Statistics A Decision Making Approach 8th Edition Groebner Solutions...ForemanForemans
 
Cluster analysis using k-means method in R
Cluster analysis using k-means method in RCluster analysis using k-means method in R
Cluster analysis using k-means method in RVladimir Bakhrushin
 
K-means Clustering Algorithm with Matlab Source code
K-means Clustering Algorithm with Matlab Source codeK-means Clustering Algorithm with Matlab Source code
K-means Clustering Algorithm with Matlab Source codegokulprasath06
 
Mathematics online: some common algorithms
Mathematics online: some common algorithmsMathematics online: some common algorithms
Mathematics online: some common algorithmsMark Moriarty
 

What's hot (14)

Choice of unit practice problems
Choice of unit practice problemsChoice of unit practice problems
Choice of unit practice problems
 
K mean-clustering
K mean-clusteringK mean-clustering
K mean-clustering
 
Deep learning networks
Deep learning networksDeep learning networks
Deep learning networks
 
An enhanced fuzzy rough set based clustering algorithm for categorical data
An enhanced fuzzy rough set based clustering algorithm for categorical dataAn enhanced fuzzy rough set based clustering algorithm for categorical data
An enhanced fuzzy rough set based clustering algorithm for categorical data
 
Data Science Job Required Skill Analysis
Data Science Job Required Skill AnalysisData Science Job Required Skill Analysis
Data Science Job Required Skill Analysis
 
Neural nw k means
Neural nw k meansNeural nw k means
Neural nw k means
 
Parallel kmeans clustering in Erlang
Parallel kmeans clustering in ErlangParallel kmeans clustering in Erlang
Parallel kmeans clustering in Erlang
 
coppin chapter 10e.ppt
coppin chapter 10e.pptcoppin chapter 10e.ppt
coppin chapter 10e.ppt
 
Business Statistics A Decision Making Approach 8th Edition Groebner Solutions...
Business Statistics A Decision Making Approach 8th Edition Groebner Solutions...Business Statistics A Decision Making Approach 8th Edition Groebner Solutions...
Business Statistics A Decision Making Approach 8th Edition Groebner Solutions...
 
Cluster analysis using k-means method in R
Cluster analysis using k-means method in RCluster analysis using k-means method in R
Cluster analysis using k-means method in R
 
Data miningpresentation
Data miningpresentationData miningpresentation
Data miningpresentation
 
K-means Clustering Algorithm with Matlab Source code
K-means Clustering Algorithm with Matlab Source codeK-means Clustering Algorithm with Matlab Source code
K-means Clustering Algorithm with Matlab Source code
 
K means
K meansK means
K means
 
Mathematics online: some common algorithms
Mathematics online: some common algorithmsMathematics online: some common algorithms
Mathematics online: some common algorithms
 

Similar to Random

An Evolutionary-based Neural Network for Distinguishing between Genuine and P...
An Evolutionary-based Neural Network for Distinguishing between Genuine and P...An Evolutionary-based Neural Network for Distinguishing between Genuine and P...
An Evolutionary-based Neural Network for Distinguishing between Genuine and P...Md Rakibul Hasan
 
Microarray data noise simulation
Microarray data noise simulationMicroarray data noise simulation
Microarray data noise simulationDespoina Kalfakakou
 
A Study on Privacy Level in Publishing Data of Smart Tap Network
A Study on Privacy Level in Publishing Data of Smart Tap NetworkA Study on Privacy Level in Publishing Data of Smart Tap Network
A Study on Privacy Level in Publishing Data of Smart Tap NetworkHa Phuong
 
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017MLconf
 
Statistical Preliminaries
Statistical PreliminariesStatistical Preliminaries
Statistical PreliminariesR A Akerkar
 
Compressed sensing techniques for sensor data using unsupervised learning
Compressed sensing techniques for sensor data using unsupervised learningCompressed sensing techniques for sensor data using unsupervised learning
Compressed sensing techniques for sensor data using unsupervised learningSong Cui, Ph.D
 
An Efficient Approach for Outlier Detection in Wireless Sensor Network
An Efficient Approach for Outlier Detection in Wireless Sensor NetworkAn Efficient Approach for Outlier Detection in Wireless Sensor Network
An Efficient Approach for Outlier Detection in Wireless Sensor NetworkIOSR Journals
 
Disease Prediction And Doctor Appointment system
Disease Prediction And Doctor Appointment  systemDisease Prediction And Doctor Appointment  system
Disease Prediction And Doctor Appointment systemKOYELMAJUMDAR1
 
Project presentation - Capstone
Project presentation  - CapstoneProject presentation  - Capstone
Project presentation - CapstoneSkandha Ch
 
Introduction to Principle Component Analysis
Introduction to Principle Component AnalysisIntroduction to Principle Component Analysis
Introduction to Principle Component AnalysisSunjeet Jena
 
CBM Fault Detection by Carl Byington
CBM Fault Detection by Carl ByingtonCBM Fault Detection by Carl Byington
CBM Fault Detection by Carl ByingtonCarl Byington
 

Similar to Random (20)

1 public embedd
1 public embedd1 public embedd
1 public embedd
 
related
relatedrelated
related
 
main
mainmain
main
 
5
55
5
 
7
77
7
 
7
77
7
 
An Evolutionary-based Neural Network for Distinguishing between Genuine and P...
An Evolutionary-based Neural Network for Distinguishing between Genuine and P...An Evolutionary-based Neural Network for Distinguishing between Genuine and P...
An Evolutionary-based Neural Network for Distinguishing between Genuine and P...
 
Microarray data noise simulation
Microarray data noise simulationMicroarray data noise simulation
Microarray data noise simulation
 
A Study on Privacy Level in Publishing Data of Smart Tap Network
A Study on Privacy Level in Publishing Data of Smart Tap NetworkA Study on Privacy Level in Publishing Data of Smart Tap Network
A Study on Privacy Level in Publishing Data of Smart Tap Network
 
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
 
Statistical Preliminaries
Statistical PreliminariesStatistical Preliminaries
Statistical Preliminaries
 
Compressed sensing techniques for sensor data using unsupervised learning
Compressed sensing techniques for sensor data using unsupervised learningCompressed sensing techniques for sensor data using unsupervised learning
Compressed sensing techniques for sensor data using unsupervised learning
 
An Efficient Approach for Outlier Detection in Wireless Sensor Network
An Efficient Approach for Outlier Detection in Wireless Sensor NetworkAn Efficient Approach for Outlier Detection in Wireless Sensor Network
An Efficient Approach for Outlier Detection in Wireless Sensor Network
 
Disease Prediction And Doctor Appointment system
Disease Prediction And Doctor Appointment  systemDisease Prediction And Doctor Appointment  system
Disease Prediction And Doctor Appointment system
 
Project presentation - Capstone
Project presentation  - CapstoneProject presentation  - Capstone
Project presentation - Capstone
 
PCA Final.pptx
PCA Final.pptxPCA Final.pptx
PCA Final.pptx
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Introduction to Principle Component Analysis
Introduction to Principle Component AnalysisIntroduction to Principle Component Analysis
Introduction to Principle Component Analysis
 
CBM Fault Detection by Carl Byington
CBM Fault Detection by Carl ByingtonCBM Fault Detection by Carl Byington
CBM Fault Detection by Carl Byington
 
Data cleaning-outlier-detection
Data cleaning-outlier-detectionData cleaning-outlier-detection
Data cleaning-outlier-detection
 

Recently uploaded

Low Cost Coimbatore Call Girls Service 👉📞 6378878445 👉📞 Just📲 Call Ruhi Call ...
Low Cost Coimbatore Call Girls Service 👉📞 6378878445 👉📞 Just📲 Call Ruhi Call ...Low Cost Coimbatore Call Girls Service 👉📞 6378878445 👉📞 Just📲 Call Ruhi Call ...
Low Cost Coimbatore Call Girls Service 👉📞 6378878445 👉📞 Just📲 Call Ruhi Call ...vershagrag
 
Howrah [ Call Girls Kolkata ₹7.5k Pick Up & Drop With Cash Payment 8005736733...
Howrah [ Call Girls Kolkata ₹7.5k Pick Up & Drop With Cash Payment 8005736733...Howrah [ Call Girls Kolkata ₹7.5k Pick Up & Drop With Cash Payment 8005736733...
Howrah [ Call Girls Kolkata ₹7.5k Pick Up & Drop With Cash Payment 8005736733...HyderabadDolls
 
drug book file on obs. and gynae clinical pstings
drug book file on obs. and gynae clinical pstingsdrug book file on obs. and gynae clinical pstings
drug book file on obs. and gynae clinical pstingsKarishma7720
 
Top profile Call Girls In bhubaneswar [ 7014168258 ] Call Me For Genuine Mode...
Top profile Call Girls In bhubaneswar [ 7014168258 ] Call Me For Genuine Mode...Top profile Call Girls In bhubaneswar [ 7014168258 ] Call Me For Genuine Mode...
Top profile Call Girls In bhubaneswar [ 7014168258 ] Call Me For Genuine Mode...gajnagarg
 
Top profile Call Girls In Rampur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Rampur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Rampur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Rampur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
UIowa Application Instructions - 2024 Update
UIowa Application Instructions - 2024 UpdateUIowa Application Instructions - 2024 Update
UIowa Application Instructions - 2024 UpdateUniversity of Iowa
 
Gabriel_Carter_EXPOLRATIONpp.pptx........
Gabriel_Carter_EXPOLRATIONpp.pptx........Gabriel_Carter_EXPOLRATIONpp.pptx........
Gabriel_Carter_EXPOLRATIONpp.pptx........deejay178
 
Top profile Call Girls In Agartala [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Agartala [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Agartala [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Agartala [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
DMER-AYUSH-MIMS-Staff-Nurse-_Selection-List-04-05-2024.pdf
DMER-AYUSH-MIMS-Staff-Nurse-_Selection-List-04-05-2024.pdfDMER-AYUSH-MIMS-Staff-Nurse-_Selection-List-04-05-2024.pdf
DMER-AYUSH-MIMS-Staff-Nurse-_Selection-List-04-05-2024.pdfReemaKhan31
 
Top profile Call Girls In chittoor [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In chittoor [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In chittoor [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In chittoor [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
一比一定(购)中央昆士兰大学毕业证(CQU毕业证)成绩单学位证
一比一定(购)中央昆士兰大学毕业证(CQU毕业证)成绩单学位证一比一定(购)中央昆士兰大学毕业证(CQU毕业证)成绩单学位证
一比一定(购)中央昆士兰大学毕业证(CQU毕业证)成绩单学位证eqaqen
 
Dating Call Girls inTiruvallur { 9332606886 } VVIP NISHA Call Girls Near 5 St...
Dating Call Girls inTiruvallur { 9332606886 } VVIP NISHA Call Girls Near 5 St...Dating Call Girls inTiruvallur { 9332606886 } VVIP NISHA Call Girls Near 5 St...
Dating Call Girls inTiruvallur { 9332606886 } VVIP NISHA Call Girls Near 5 St...ruksarkahn825
 
怎样办理伊利诺伊大学厄巴纳-香槟分校毕业证(UIUC毕业证书)成绩单学校原版复制
怎样办理伊利诺伊大学厄巴纳-香槟分校毕业证(UIUC毕业证书)成绩单学校原版复制怎样办理伊利诺伊大学厄巴纳-香槟分校毕业证(UIUC毕业证书)成绩单学校原版复制
怎样办理伊利诺伊大学厄巴纳-香槟分校毕业证(UIUC毕业证书)成绩单学校原版复制yynod
 
Top profile Call Girls In Shivamogga [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Shivamogga [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Shivamogga [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Shivamogga [ 7014168258 ] Call Me For Genuine Model...nirzagarg
 
Guide to a Winning Interview May 2024 for MCWN
Guide to a Winning Interview May 2024 for MCWNGuide to a Winning Interview May 2024 for MCWN
Guide to a Winning Interview May 2024 for MCWNBruce Bennett
 
Top profile Call Girls In daman [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In daman [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In daman [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In daman [ 7014168258 ] Call Me For Genuine Models We ...gajnagarg
 
Top profile Call Girls In Anantapur [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Anantapur [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Anantapur [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Anantapur [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
B.tech civil major project by Deepak Kumar
B.tech civil major project by Deepak KumarB.tech civil major project by Deepak Kumar
B.tech civil major project by Deepak KumarDeepak15CivilEngg
 
<DUBAI>Abortion pills IN UAE {{+971561686603*^Mifepristone & Misoprostol in D...
<DUBAI>Abortion pills IN UAE {{+971561686603*^Mifepristone & Misoprostol in D...<DUBAI>Abortion pills IN UAE {{+971561686603*^Mifepristone & Misoprostol in D...
<DUBAI>Abortion pills IN UAE {{+971561686603*^Mifepristone & Misoprostol in D...gynedubai
 

Recently uploaded (20)

Low Cost Coimbatore Call Girls Service 👉📞 6378878445 👉📞 Just📲 Call Ruhi Call ...
Low Cost Coimbatore Call Girls Service 👉📞 6378878445 👉📞 Just📲 Call Ruhi Call ...Low Cost Coimbatore Call Girls Service 👉📞 6378878445 👉📞 Just📲 Call Ruhi Call ...
Low Cost Coimbatore Call Girls Service 👉📞 6378878445 👉📞 Just📲 Call Ruhi Call ...
 
Howrah [ Call Girls Kolkata ₹7.5k Pick Up & Drop With Cash Payment 8005736733...
Howrah [ Call Girls Kolkata ₹7.5k Pick Up & Drop With Cash Payment 8005736733...Howrah [ Call Girls Kolkata ₹7.5k Pick Up & Drop With Cash Payment 8005736733...
Howrah [ Call Girls Kolkata ₹7.5k Pick Up & Drop With Cash Payment 8005736733...
 
drug book file on obs. and gynae clinical pstings
drug book file on obs. and gynae clinical pstingsdrug book file on obs. and gynae clinical pstings
drug book file on obs. and gynae clinical pstings
 
Top profile Call Girls In bhubaneswar [ 7014168258 ] Call Me For Genuine Mode...
Top profile Call Girls In bhubaneswar [ 7014168258 ] Call Me For Genuine Mode...Top profile Call Girls In bhubaneswar [ 7014168258 ] Call Me For Genuine Mode...
Top profile Call Girls In bhubaneswar [ 7014168258 ] Call Me For Genuine Mode...
 
Top profile Call Girls In Rampur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Rampur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Rampur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Rampur [ 7014168258 ] Call Me For Genuine Models We...
 
UIowa Application Instructions - 2024 Update
UIowa Application Instructions - 2024 UpdateUIowa Application Instructions - 2024 Update
UIowa Application Instructions - 2024 Update
 
Gabriel_Carter_EXPOLRATIONpp.pptx........
Gabriel_Carter_EXPOLRATIONpp.pptx........Gabriel_Carter_EXPOLRATIONpp.pptx........
Gabriel_Carter_EXPOLRATIONpp.pptx........
 
Top profile Call Girls In Agartala [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Agartala [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Agartala [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Agartala [ 7014168258 ] Call Me For Genuine Models ...
 
DMER-AYUSH-MIMS-Staff-Nurse-_Selection-List-04-05-2024.pdf
DMER-AYUSH-MIMS-Staff-Nurse-_Selection-List-04-05-2024.pdfDMER-AYUSH-MIMS-Staff-Nurse-_Selection-List-04-05-2024.pdf
DMER-AYUSH-MIMS-Staff-Nurse-_Selection-List-04-05-2024.pdf
 
Top profile Call Girls In chittoor [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In chittoor [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In chittoor [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In chittoor [ 7014168258 ] Call Me For Genuine Models ...
 
一比一定(购)中央昆士兰大学毕业证(CQU毕业证)成绩单学位证
一比一定(购)中央昆士兰大学毕业证(CQU毕业证)成绩单学位证一比一定(购)中央昆士兰大学毕业证(CQU毕业证)成绩单学位证
一比一定(购)中央昆士兰大学毕业证(CQU毕业证)成绩单学位证
 
Dating Call Girls inTiruvallur { 9332606886 } VVIP NISHA Call Girls Near 5 St...
Dating Call Girls inTiruvallur { 9332606886 } VVIP NISHA Call Girls Near 5 St...Dating Call Girls inTiruvallur { 9332606886 } VVIP NISHA Call Girls Near 5 St...
Dating Call Girls inTiruvallur { 9332606886 } VVIP NISHA Call Girls Near 5 St...
 
怎样办理伊利诺伊大学厄巴纳-香槟分校毕业证(UIUC毕业证书)成绩单学校原版复制
怎样办理伊利诺伊大学厄巴纳-香槟分校毕业证(UIUC毕业证书)成绩单学校原版复制怎样办理伊利诺伊大学厄巴纳-香槟分校毕业证(UIUC毕业证书)成绩单学校原版复制
怎样办理伊利诺伊大学厄巴纳-香槟分校毕业证(UIUC毕业证书)成绩单学校原版复制
 
Cara Gugurkan Kandungan Awal Kehamilan 1 bulan (087776558899)
Cara Gugurkan Kandungan Awal Kehamilan 1 bulan (087776558899)Cara Gugurkan Kandungan Awal Kehamilan 1 bulan (087776558899)
Cara Gugurkan Kandungan Awal Kehamilan 1 bulan (087776558899)
 
Top profile Call Girls In Shivamogga [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Shivamogga [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Shivamogga [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Shivamogga [ 7014168258 ] Call Me For Genuine Model...
 
Guide to a Winning Interview May 2024 for MCWN
Guide to a Winning Interview May 2024 for MCWNGuide to a Winning Interview May 2024 for MCWN
Guide to a Winning Interview May 2024 for MCWN
 
Top profile Call Girls In daman [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In daman [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In daman [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In daman [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Anantapur [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Anantapur [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Anantapur [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Anantapur [ 7014168258 ] Call Me For Genuine Models...
 
B.tech civil major project by Deepak Kumar
B.tech civil major project by Deepak KumarB.tech civil major project by Deepak Kumar
B.tech civil major project by Deepak Kumar
 
<DUBAI>Abortion pills IN UAE {{+971561686603*^Mifepristone & Misoprostol in D...
<DUBAI>Abortion pills IN UAE {{+971561686603*^Mifepristone & Misoprostol in D...<DUBAI>Abortion pills IN UAE {{+971561686603*^Mifepristone & Misoprostol in D...
<DUBAI>Abortion pills IN UAE {{+971561686603*^Mifepristone & Misoprostol in D...
 

Random

  • 1. Random Data Perturbation Techniques and Privacy Preserving Data Mining (Authors: H. Kargupta, S. Datta, Q. Wang & K. Sivakumar) April 26, 2005 Gunjan Gupta 1
  • 2. Privacy & Good Service: Often Conflicting Goals • Privacy – Customer: I don’t want you to share my personal information with anyone. – Business: I don’t want to share my data with a competitor. • Quantity, Cost & Quality of Service – Customer: I want you to provide me lower cost of service – and good quality. – and at lower cost. • Paradox: lower cost often comes from being able to use/share sensitive data that can be used or misused: – Provide better service by predicting consumer needs better, or sell information to marketers. – Optimize load sharing between competing utilities or preempting competition. – Doctor saving patient by knowing patient history or insurance companies declining coverage to individuals with preexisting conditions. 2
  • 3. Central Question: Can we use privacy sensitive data to optimize cost and quality of a service without compromising any privacy? 3
  • 5. Long Answer: Maybe compromise a small amount of privacy (low cost increase) to improve quality and cost of service (high cost savings) substantially. 5
  • 6. Why anonymous exact records not so secure? • Example : medical insurance premium estimation based on patient history – Predictive fields often generic: age, sex, disease history, first two digits of zip code (not allowed in Germany). no. of kids etc. – Specifics such as record id (key), name, address omitted. • This could be easily broken by matching non-secure records with secure anonymous records: Anonymous “privacy preserving records” Yellowpages Female, 43, 3 kids, 78---,married, anonymous medical record 1 Female, 43, 2 kids, 78---, single anonymous medical record 2 Internal Human + Automated hacker Broken Exact record Susan Calvin, 121 Norwood Cr. Austin, TX-78753 Personal website Hi, I am Susan, and here are pictures of me, my husband, and my 3 wonderful kids from my 43rd birthday party! Susan Calvin, 43, 3 kids, Address, 78733, now labeled med. Records! 6
  • 7. Two approaches to Privacy Preserving • Distributed: – Suitable for multi-party platforms. Share sub-models. – Unsupervised: Ensemble Clustering, Privacy Preserving Clustering etc. – Supervised: Meta-learners, Fourier Spectrum Decision Trees, Collective Hierarchical Clustering and so on.. – Secure communication based: Secure sum, secure scalar product • Random Data Perturbation: Our focus – Perturb data by small amounts to protect privacy of individual records. – Preserve intrinsic distributions necessary for modeling. 7
  • 8. Recovering approximately correct anonymous features also breaks privacy • Somewhat inexactly recovered anonymous record values might also be sufficient: “Denoised” privacy preserving records Female, 44.5, 3.2 kids, 78---,married, anonymous medical record 1 Female, 42.2, 2.1 kids, 78---, single, anonymous medical record 2 Internal Human + Automated hacker yellowpages Susan Calvin, 121 Norwood Cr. Austin, TX-78753 Personal website Hi, I am Susan, and here are pictures of me, my husband, and my 3 wonderful kids from my 43rd birthday party! Susan Calvin, 43, 3 kids, Address, Broken Exact record 78733, now labeled med. Records! 8
  • 9. Anonymous records (with or without) small perturbations not secure: not a recently noticed phenomena • 1979, Denning & Denning: The Tracker: A Threat to Statistical Database Security – Show why anonymous records are not secure. – Show example of recovering exact salary of a professor from anonymous records. – Present a general algorithm for an Individual Tracker. – A formal probabilistic model and set of conditions that make a dataset support such a tracker. • 1984, Traub & Yemin: The Statistical Security of a Statistical Database: – No free lunch: perturbations cause irrecoverable loss in model accuracy. – However, the holy grail of random perturbation: We can try to find a perturbation algorithm that best trades off between loss of privacy vs. model accuracy. 9
  • 10. Recovering perturbed distributions: Earlier work • Reconstructing Original Distribution from Perturbed Ones. Setup: – – – – • N samples U1, U2, U3.. Xn N noise values V1, V2, V3.. Vn all taken from a public(known) distribution V. Visible noisy data: W1=U1+V1, W2=U2+V2 . . Assumption: Such noise can allow you to recover the distribution of X1,X2,X3 ..Xn, but not the individual record’s. Two well known methods and definitions: – Agrawal & Srikant: Interval based: Privacy(X) at Confidence 0.95= X2-X1 – X1 X2 Agrawal & Aggarwal: Distributional Privacy(X)=2h(x) f(x) f’(x) 10
  • 11. Interval Based Method: Agrawal & Srikant in more detail • N samples U1, U2, U3.. Xn • N noise values V1, V2, V3.. Vn all taken from a public(known) distribution V. • W1=U1+V1, W2=U2+V2 . . • Visible noisy data: W1, W2, W3 .. Given: noise function fV , using Bayes’ Rule, we can show that the cumulative posterior distribution function of u in terms of w (visible) and fV , and unknown desired function fu , Differentiating w.r.t. u we get an important recursive definition: Notation issue (in paper): f‘ simply means approximation of true f, not derivative of f ! 11
  • 12. Interval Based Method: Agrawal & Srikant in more detail Algorithm in practice: Seed with a uniform distribution for J=0 STEP J STEP J+1 replaced integration with summation over i.i.d samples sum over discrete z intervals instead of integral for speed • Converges to a local minima? Different than uniform initialization might give a different result. Not explored by authors. • For large enough samples, hope to get close to true distribution. • Stop when fU(J+1) – fU(J) becomes small. 12
  • 13. Interval Based Method: Good Results for a variety of noises 13
  • 14. Revisiting an Essential Assumption in the Random Perturbation Assumption: Such noise can allow you to recover the distribution of X1,X2,X3 ..Xn, but not the individual record’s. • The Authors in this paper challenge this assumption. • Claim randomness addition can be mostly visual and not real: • Many simple forms of random perturbations are “breakable”. 14
  • 15. Exploit predictable properties of Random data to design a filter to break the perturbation encryption? All eigen-values close to 1! Spiral data Random data 15
  • 16. Spectral Filtering: Main Idea: Use eigen-values properties of noise to filter • U+V data • Decomposition of eeigen-values of noise and original data • Recovered data 16
  • 17. Decomposing eigen-values: separating data from noise Let – U and V be the m x n data and noise matrices P the perturbed matrix UP= U+V Covariance matrix of UP = UP T UP = (U+V) T (U+V) = UTU + VTU + UTV + UTU Since signal and noise are uncorrelated in random perturbation, for large no. of observations: VTU ~ 0 and UTV ~ 0, therefore UP T UP = UTU + VTV Since the above 3 matrices are correlation matrices, they are symmetric and positive semi-definite, therefore, we can perform eigen decomposition: 17
  • 18. With bunch of algebra and theorems from Matrix Perturbation theory, authors show that in the limit (lots of data).. Wigner’s law: Describes distribution of eigen values for normal random matrices: • eigen values for noise component V stick in a thin range given by λmin and λmax (show example next page) with high probability. • Allows us to compute λmin and λmax. Solution! Giving us the following algorithm: 1. Find a large no. of eigen values of the perturbed data P. 2. Separate all eigen values inside λmin and λmax and save row indices IV 3. Take the remaining eigen indices to get the “peturbed” but not noise eigens coming from true data U: save their row indices I U 4. Break perturbed eigenvector matrix QP into AU = QP (IU), AV = QP (IV). 5. Estimate true data as projection : 18
  • 19. Exploit predictable properties of Random data to design a filter to break the perturbation encryption? All eigen-values close to 1! Spiral data Random data 19
  • 20. Results: Quality of Eeigen values recovery Only the real eigen’s got captured, because of the nice automatic thresholding ! 20
  • 21. Results: Comparison with Aggarwal’s reproduction Agrawal & Srikant (no breaking of encryption) Agrawal & Srikant (estimated from broken encryption) 21
  • 22. Discussion • • • • Amazing amount of experimental results and comparisons presented by authors in the Journal version. Extension to a situation where perturbing distribution form is known but exact first , second or higher order statistics not known: discussed but not presented. Comparison of performance with other obvious techniques for noise reduction in signal processing community: – Moving Averages and Weiner Filtering. – PCA Based filtering. Pros and Cons of the perturbation analysis by authors (and in general): – Effect of more and more noise: rapid degradation of results. – Problem in dealing with inherent noise in original data. – Technique fails when features independent of each other because of Covariance matrix exploitation: Points to a major improvement possibility in encryption: perform ICA/PCA and then randomize? – Results suggest that more complex noise models might be harder to break. Not clear if this improves privacy-model quality tradeoff? – eigen decomposition has an inherent metric assumption? 22
  • 23. A not-so-ominous* application of noise filtering: Nulling Interferometer on Terrestrial Planet Finder-I alien ship *but maybe not if you believe Hollywood movies such as Independence Day! 23