SlideShare a Scribd company logo
1 of 5
Download to read offline
DESENSITIZED RDCA SUBSPACES FOR COMPRESSIVE PRIVACY IN MACHINE
LEARNING
Artur Filipowicz Thee Chanyaswad S.Y. Kung
Princeton University
Princeton, NJ
ABSTRACT
The quest for better data analysis and artificial intelligence has lead
to more and more data being collected and stored. As a consequence,
more data are exposed to malicious entities. This paper examines the
problem of privacy in machine learning for classification. We utilize
the Ridge Discriminant Component Analysis (RDCA) to desensitize
data with respect to a privacy label. Based on five experiments, we
show that desensitization by RDCA can effectively protect privacy
(i.e. low accuracy on the privacy label) with small loss in utility.
On HAR and CMU Faces datasets, the use of desensitized data re-
sults in random guess level accuracies for privacy at a cost of 5.14%
and 0.04%, on average, drop in the utility accuracies. For Semeion
Handwritten Digit dataset, accuracies of the privacy-sensitive digits
are almost zero, while the accuracies for the utility-relevant digits
drop by 7.53% on average. This presents a promising solution to the
problem of privacy in machine learning for classification.
Index Terms— Compressive Privacy, Ridge Discriminant Com-
ponent Analysis (RDCA), Privacy-preserving Data Mining/Machine
Learning, Data Desensitization, Dimension Reduction
1. INTRODUCTION
Innovation in the 21st
Century electronics centers around data pro-
cessing. Progress is fueled by the symbiotic relationship between
big data and machine learning in which machine learning allows us
to interpret big data and big data allows us to train large machine
learning models. In the world of big data, videos, photos, emails,
banking transactions, browsing history, GPS tracks, and other per-
sonal data are continuously collected and stored by organizations for
analysis. These data may be circulated around the Internet without
the data owner’s knowledge and be at risk of exposure to malicious
entities. A few recent data leakages are described by [1], and many
other possible attacks on privacy have been reported or proposed
[2, 3, 4, 5].
The complete problem of maintaining privacy is complex. It is
distributed temporally since data owner’s present and past actions
can compromise privacy. It is distributed spatially as the data owner
has personal information in multiple accounts, devices, and physi-
cal locations. Our focus is privacy protection in the context of ma-
chine learning for classification, at the time and location the data
owner, the user, submits his/her data to a machine learning service,
the server.
Thanks to the Brandeis Program of the Defense Advanced Research
Project Agency (DARPA) and Space and Naval Warfare System Center Pa-
cific (SSC Pacific) under Contract No. 66001-15-C-4068 for funding support,
and to Professor J. Morris Chang from Iowa State University for invaluable
discussion and assistances.
A classical solution to this problem is encryption. The user en-
crypts his/her information before submission, and the server decodes
the submitted data. However, the server may leak these data to mali-
cious entities. Therefore, it should not be trusted and should not re-
ceive information which compromises privacy. In machine learning
for classification, this is information which maximizes the classifica-
tion accuracy of the utility label while minimizing the classification
accuracy of the privacy label. Several different ideas have been pro-
posed to attack this problem such as noisy data reconstruction [6, 7],
rotational and random perturbation [8, 9], microaggregation of data
[10], privacy-centric classifier designs [11, 12, 13], etc.
Our method is based upon Compressive Privacy [14, 15, 16] ap-
proach to this problem. It utilizes the concept of data desensitization
– modifying the data by reducing the number of features such that
the privacy is protected. We employ Ridge Discriminant Compo-
nent Analysis (RDCA) [17, 18, 15] to desensitize data before they
are submitted to a server. By deriving the RDCA components with
respect to the privacy label, two subspaces are attained – the privacy
signal and privacy noise subspaces. The privacy noise subspace is
the subspace which has minimal classification power with respect to
the privacy label. Therefore, the proposed method utilizes this pri-
vacy noise subspace to project the data onto in order to desensitize
the data. Even if the sever leaks the desensitized data, a malicious
entity cannot use these data to classify the user under the privacy
label.
Based on the properties of the utility and privacy labels, we de-
fine three problem classes – Common-Unique Privacy, Common-
Common Privacy and Split-Label Privacy problems. To test the po-
tency of our method on our three different classes, we present an ex-
ample dataset – HAR, CMU Faces, and Semeion Handwritten Digit
– for each class and show that desensitization by RDCA can effec-
tively protect privacy by reducing the privacy accuracy to the ran-
dom guess level in the HAR and CMU Faces datasets, and to almost
zero on the privacy-sensitive digits in the Semeion Handwritten Digit
dataset. On the other hand, the utility accuracies only drop by 5.14%
in the HAR dataset, on average by 0.04% in the CMU Faces dataset,
and on average by 7.53% on the utility-relevant digits in the Semeion
Handwritten Digit dataset. This confirms that the proposed desensi-
tization method by RDCA is promising in providing a solution in
privacy-preserving machine learning.
2. PRIVACY PROBLEM CLASSES
In a standard classification problem, for a given set of supervised
training data of N samples and M features, {X, y}, a classifier is
trained to predict y based on X. When considering privacy, we de-
fine a privacy label y(p)
and a utility label y(u)
. Our objective is
then to minimize the possibility of y(p)
being predicted based on X
Fig. 1. Illustration for the RDCA signal and noise subspaces with
M = 2 and L = 2. In the signal subspace (left), the distance be-
tween class centroids is high, where as in the noise subspace (right),
the distance between class centroids is low relative to the distances
among all of the samples.
while maximizing our classifier’s ability to predict y(u)
. Based on
this approach, we define three classes of problems.
To define these problem classes, we first introduce the ideas of
a common and unique label. A unique label has a different class
for each user, for example social security number. A common label
has classes which are shared by multiple users, eye color being an
example.
The three problems we define are Common-Unique Privacy,
Common-Common Privacy, and Split-Label Privacy.
• In Common-Unique Privacy problem, y(u)
or y(p)
is a com-
mon label while the other label is unique. An application
example we explore is human activity recognition. In this ex-
ample, the unique label is the identity of the user submitting
activity data, while the common label is the type of activity
(walking, running, etc.) performed by the user.
• In Common-Common Privacy problem, both y(u)
and y(p)
are common labels. An example we examine is facial feature
recognition. Specifically, for each image, there are two com-
mon labels indicating if the user is wearing sunglasses and his
pose.
• Lastly, in Split-Label Privacy problem, y(u)
and y(p)
are de-
rived from a single label y where some classes are grouped.
Our example is Optical Character Recognition where we
wish to recognize digits 0 to 4 while protecting digits 5 to
9 from being recognized. The digits may be responses on a
survey about marriage where 0 to 4 represents single-never-
married, single-divorced, single-widowed etc. and digits 5 to
9 represent married-male-female, married-male-male, etc. In
this case, the response between 5-9 may leak private infor-
mation about sexual orientation, and therefore should not be
uniquely identifiable, unlike the utility digits 0-4, of which
unique identifiability may be useful for marketing purposes.
3. DESENSITIZATION BY RDCA SUBSPACE
PROJECTION
3.1. Ridge Discriminant Component Analysis (RDCA)
Ridge Discriminant Component Analysis (RDCA) [17, 18, 15] aims
at finding the subspace that maximizes the discriminant distance
among the classes. Given an L-class classification problem, RDCA
is able to provide the (L − 1)-dimensional subspace where all dis-
criminant power lies, and the remaining subspace where no discrim-
inant power remains. Conceptually, RDCA aims at maximizing the
Fig. 2. Illustration of the data subspace desensitization process.
First, the signal and noise subspaces spanned by the RDCA com-
ponents are derived from the data and the privacy label (top). Then,
the data are projected onto the privacy noise subspace to obtain the
desensitized data (bottom).
ratio between between-class scattering (signal) and total scattering
(total power). Hence, the noise subspace from RDCA corresponds to
the subspace where the distance scattering among samples is compa-
rable to the distance scattering among centroids of the classes. This
phenomenon is illustrated in Figure 1. For the mathematical discus-
sion and derivation of RDCA, we refer the readers to [17, 18, 15].
The essential property that is relevant to this proposed work is
the fact that RDCA has the capability to provide the signal and noise
subspaces with respect to a label. Given the discriminant compo-
nents derived from RDCA, {w1, w2, . . . , wL−1, wL, . . . , wM ; wi ∈
RM
}, as ordered by the decreasing discriminant power, the signal
and noise subspaces are therefore defined as span({w1, w2, . . . , wL−1}),
and span({wL, wL+1, . . . , wM }), respectively.
3.2. Desensitized Subspace and Desensitized Data
As RDCA can separate the signal and noise subspaces with respect
to a label, it lends itself nicely to the application of data desensitiza-
tion. By using the privacy label y(p)
to train RDCA, the privacy sig-
nal subspace, S
(p)
S = span({w
(p)
1 , w
(p)
2 , . . . , w
(p)
L−1}), and the pri-
vacy noise subspace, S
(p)
N = span({w
(p)
L , w
(p)
L+1, . . . , w
(p)
M }), can
be derived. Thus, by using only the privacy noise subspace for the
submitted data, the discriminant power of the privacy classes can be
minimized. Appropriately, the privacy noise subspace is called the
desensitized subspace, and the data projected onto this subspace are
referred to as the desensitized data. The procedure for producing the
desensitized data is summarized in Figure 2.
4. EXPERIMENTS
To provide real world examples of the three privacy problem classes
and show that RDCA can be used to protect privacy, we conducted
experiments on the Human Activity Recognition Using Smart-
phones, CMU Faces, and Semeion Handwritten Digit datasets. For
all experiments, SVM is used as the classifier for both utility and
privacy, and in the training phase, cross-validation is used to tune
the parameters.
4.1. HAR
Human Activity Recognition Using Smartphones (HAR) dataset
[19] aims at using mobile sensor signals (accelerometer and gyro-
scope) to predict activity being performed. The feature size of the
Label Random
Guess
Before
Desensitization
After
Desensitization
Activity
(Utility)
16.67% 97.62% 92.48%
Person
Identification
(Privacy)
5.26% 69.67% 7.02%
Table 1. Results from the HAR dataset.
dataset is 561. The data are collected from 19 individuals perform-
ing six activities. The dataset consists of 5379 samples for training
and 798 samples left out for testing. The activity is defined to be the
utility, y(u)
, whereas the person identification is defined to be the
privacy, y(p)
.
4.2. CMU Faces
CMU Faces dataset contains 640 grayscale images of 20 individuals
[20]. For each individual there is an image for every combination
of pose (straight, left, right, up), expression (neutral, happy, sad,
angry), and sunglasses (present or not). Images of size 32 by 30
pixels are used. Two experiments are performed:
• In the first experiment, the utility, y(u)
, is defined to be the
pose and the privacy, y(p)
, is the sunglasses indicator.
• In the second experiment, the utility, y(u)
, is defined to be the
sunglasses indicator and the privacy, y(p)
, is the pose.
4.3. Semeion Handwritten Digit
The Semeion Handwritten Digit dataset contains 1593 handwritten
digits from around 80 individuals. Every individual wrote each digit
from 0 to 9 twice. The samples were scanned to a 16x16 pixel
grayscale image. Each pixel was then thresholded to a Boolean value
[21]. For experiments on this dataset, y(u)
and y(p)
are construct by
grouping digits in the following ways:
• In the first experiment, the objective is to recognize digits 0
to 4 and protect digits 5 to 9. Utility, y(u)
, is equal to 0 to 4 if
the image is of such digit and it is equal to 5 if the image is of
5, 6, 7, 8 or 9. Similarly, privacy label, y(p)
, is equal to 5 to 9
if the image is of such digit and it is equal to 0 otherwise.
• For the second experiment, the values of the y(u)
and y(p)
are
swapped.
5. RESULTS
For all results, three accuracies are reported for comparison. The
random guess is the accuracy when no training is performed and the
prediction, hence, is made based on the frequency of the class in the
dataset. The accuracy before desensitization is resulted from the pre-
diction using full dimension of RDCA for the corresponding label,
along with the classifier. Finally, the accuracy after desensitization
is resulted from using the classifier on the desensitized data.
Table 1 reports the classification results on the HAR dataset. Ta-
ble 2 reports the results from both experiments on the CMU Faces
dataset. Finally, Table 3 and Table 4 report the results from the two
experiments on the Semeion Handwritten Digit dataset.
Experiment I
Label Random
Guess
Before
Desensitization
After
Desensitization
Pose
(Utility)
25.00% 83.30% 83.25%
Glasses
(Privacy)
50.00% 86.00% 50.47%
Experiment II
Label Random
Guess
Before
Desensitization
After
Desensitization
Glasses
(Utility)
50.00% 86.00% 85.97%
Pose
(Privacy)
25.00% 83.30% 25.00%
Table 2. Results from the two experiments on the CMU Faces
dataset.
Experiment I
Utility: 0-4
Digit Random
Guess
Before
Desensitization
After
Desensitization
0 10.0% 95.61% 92.86%
1 10.0% 84.10% 73.71%
2 10.0% 86.30% 80.80%
3 10.0% 76.70% 72.50%
4 10.0% 83.14% 75.33%
The Rest 50.0% 94.67% 92.46%
Privacy: 5-9
Digit Random
Guess
Before
Desensitization
After
Desensitization
5 10.0% 90.16% 0.00%
6 10.0% 82.40% 0.00%
7 10.0% 85.24% 0.00%
8 10.0% 83.50% 0.00%
9 10.0% 88.30% 0.00%
The Rest 50.0% 68.90% 99.86%
Table 3. Results from the first experiment on the Semeion Hand-
written Digit dataset, when the digits 0-4 are defined as the utility,
whereas the digits 5-9 are defined as the privacy.
Experiment II
Utility: 5-9
Digit Random
Guess
Before
Desensitization
After
Desensitization
5 10.0% 90.16% 75.20%
6 10.0% 82.40% 80.20%
7 10.0% 85.24% 81.70%
8 10.0% 83.50% 82.90%
9 10.0% 88.30% 64.90%
The Rest 50.0% 68.90% 86.50%
Privacy: 0-4
Digit Random
Guess
Before
Desensitization
After
Desensitization
0 10.0% 95.61% 0.01%
1 10.0% 84.10% 0.01%
2 10.0% 86.30% 0.00%
3 10.0% 76.70% 0.02%
4 10.0% 83.14% 0.01%
The Rest 50.0% 94.67% 100.00%
Table 4. Results from the second experiment on the Semeion Hand-
written Digit dataset, when the digits 5-9 are defined as the utility,
whereas the digits 0-4 are defined as the privacy.
6. DISCUSSION
6.1. The Effects of Desensitization on Privacy and Utility
Five experiments on the three datasets indicate that desensitization
by RDCA can effectively protect privacy with respect to the privacy
label. The privacy accuracies drop to the random guess level in both
HAR and CMU Faces datasets, while the privacy accuracies of the
privacy-sensitive digits are almost zero in the Semeion Handwritten
Digit dataset. Note that the reason the privacy accuracies approach
zero for the privacy-sensitive digits is because the classifier predicts
most samples to be in the “don’t care”, “The Rest”, class, which is
desirable for privacy under the scenario considered.
On the other hand, desensitization does not attenuate the utility
as significantly. It reduces the utility accuracies of the HAR exper-
iment and both experiments on CMU Faces by only 5.14%, 0.05%,
and 0.03%, respectively. On the Semeion Handwritten Digit exper-
iments, the utility accuracies also only drop by 7.53% on average
across all utility-relevant digits. This shows that desensitization can
be a viable tool in effectively protecting privacy, while still providing
good utility.
6.2. Unique-Unique Privacy Problem
One other variant of the privacy problem is Unique-Unique Privacy
problem. However, because all unique labels are surrogates for iden-
tity, and differ in name only, when the objective is to protect a unique
label while trying to predict another unique label, the problem is a
contradiction.
6.3. Future Works
RDCA approach to privacy should be extended to include regression.
This extension would be useful in a case where the utility is predict-
ing how much someone would be willing to spend on a house while
privacy is his savings account balance. Another useful extension is
making this method applicable to cases with multiple utility and pri-
vacy labels. For example predicting favorite activity and food while
protecting citizenship status and political affiliation. With those two
extensions, it would be interesting to try making all variables in the
dataset private except for the utility label or labels. Then it may be
possible to have a machine learning service in which the desensitized
data the user is submitting cannot be used to learn the original data.
7. CONCLUSION
We defined three privacy problem classes in machine learning for
classification, in which the common goal is to maximize the clas-
sification accuracy of the utility label, y(u)
, while minimizing the
classification accuracy of the privacy label, y(p)
. Common-Unique
Privacy problems have one label which is unique to each user.
Common-Common Privacy problems have both labels which are not
unique to users. Split-Label Privacy problems have y(u)
and y(p)
derived from a single label y where some classes are grouped.
Based on five experiments, we show that data desensitization
by RDCA can effectively protect privacy across all three problem
classes. On HAR and CMU Faces datasets, the use of desensitized
data results in random guess level accuracies for privacy label at a
cost of 5.14% and 0.04%, on average, drop in accuracy on the util-
ity label. For Semeion Handwritten Digit dataset, accuracies of the
privacy-sensitive digits are almost zero and the accuracies for utility-
relevant digits drop by 7.53% on average. In all experiments, the
tradeoffs between privacy and utility maybe acceptable and warrant
a further exploration and development of this method.
Acknowledgement
This material is based upon work supported in part by the Bran-
deis Program of the Defense Advanced Research Project Agency
(DARPA) and Space and Naval Warfare System Center Pacific (SSC
Pacific) under Contract No. 66001-15-C-4068. The authors wish
to thank Professor J. Morris Chang from Iowa State University for
invaluable discussion and assistances.
8. REFERENCES
[1] Privacy Rights Clearinghouse, “Chronology of data breaches,”
http://www.privacyrights.org/data-breach,
2016, Accessed: 2016-08-17.
[2] Arvind Narayanan and Vitaly Shmatikov, “How to break
anonymity of the netflix prize dataset,” CoRR, vol.
abs/cs/0610105, 2006.
[3] Arvind Narayanan, Hristo Paskov, Neil Zhenqiang Gong, John
Bethencourt, Emil Stefanov, Eui Chul Richard Shin, and Dawn
Song, “On the feasibility of internet-scale author identifica-
tion,” in 2012 IEEE Symposium on Security and Privacy. 2012,
pp. 300–314, IEEE.
[4] Joseph A. Calandrino, Ann Kilzer, Arvind Narayanan, Ed-
ward W. Felten, and Vitaly Shmatikov, “” you might also like:”
privacy risks of collaborative filtering,” in 2011 IEEE Sympo-
sium on Security and Privacy. 2011, pp. 231–246, IEEE.
[5] Michael Barbaro and Tom Zeller Jr., “A face is exposed for
aol searcher no. 4417749,” http://www.nytimes.com/
2006/08/09/technology/09aol.html, Aug 9, 2006
2006.
[6] Rakesh Agrawal and Ramakrishnan Srikant, “Privacy-
preserving data mining,” in ACM Sigmod Record. 2000,
vol. 29, pp. 439–450, ACM.
[7] Dakshi Agrawal and Charu C. Aggarwal, “On the design
and quantification of privacy preserving data mining algo-
rithms,” in Proceedings of the twentieth ACM SIGMOD-
SIGACT-SIGART symposium on Principles of database sys-
tems. 2001, pp. 247–255, ACM.
[8] Keke Chen and Ling Liu, “Privacy preserving data classifica-
tion with rotation perturbation,” in Data Mining, Fifth IEEE
International Conference on. 2005, p. 4 pp., IEEE.
[9] Kun Liu, Hillol Kargupta, and Jessica Ryan, “Random
projection-based multiplicative data perturbation for privacy
preserving distributed data mining,” Knowledge and Data En-
gineering, IEEE Transactions on, vol. 18, no. 1, pp. 92–106,
2006.
[10] Nico Schlitter and Jrg Lssig, “Distributed privacy preserving
classification based on local cluster identifiers,” in Trust, Se-
curity and Privacy in Computing and Communications (Trust-
Com), 2012 IEEE 11th International Conference on. 2012, pp.
1265–1272, IEEE.
[11] Hwanjo Yu, Xiaoqian Jiang, and Jaideep Vaidya, “Privacy-
preserving svm using nonlinear kernels on horizontally parti-
tioned data,” in Proceedings of the 2006 ACM symposium on
Applied computing. 2006, pp. 603–610, ACM.
[12] Keng-Pei Lin and Ming-Syan Chen, “On the design and anal-
ysis of the privacy-preserving svm classifier,” Knowledge and
Data Engineering, IEEE Transactions on, vol. 23, no. 11, pp.
1704–1717, 2011.
[13] Valeria Nikolaenko, Udi Weinsberg, Sotiris Ioannidis, Marc
Joye, Dan Boneh, and Nina Taft, “Privacy-preserving ridge
regression on hundreds of millions of records,” in Security and
Privacy (SP), 2013 IEEE Symposium on. 2013, pp. 334–348,
IEEE.
[14] S. Y. Kung, “Compressive privacy: From informa-
tion/estimation theory to machine learning,” to appear on
IEEE Signal Processing Magazine, submitted 2016, 2016.
[15] S. Y. Kung, Thee Chanyaswad, J. Morris Chang, and Peiyuan
Wu, “Collaborative pca/dca learning methods for compres-
sive privacy,” to appear on ACM Transactions on Embed-
ded Computing Systems, Special Issue on Effective Divide-and-
Conquer, Incremental, or Distributed Mechanisms of Embed-
ded Designs for Extremely Big Data in Large-Scale Devices,
submitted 2016, 2016.
[16] Thee Chanyaswad, J. Morris Chang, Prateek Mittal, and
S. Y. Kung, “Discriminant-component eigenfaces for privacy-
preserving face recognition,” in to appear on MLSP: Machine
Learning for Signal Processing, submitted 2016, 2016.
[17] Sun-Yuan Kung, “Discriminant component analysis for pri-
vacy protection and visualization of big data,” Multimedia
Tools and Applications, pp. 1–36, 2015.
[18] S. Y. Kung, Kernel Methods and Machine Learning, Cam-
bridge University Press, Cambridge, UK, 2014.
[19] Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra,
and Jorge Luis Reyes-Ortiz, “A public domain dataset for
human activity recognition using smartphones.,” in ESANN,
2013.
[20] Tom Mitchell, “Cmu face images data set,” https:
//archive.ics.uci.edu/ml/datasets/CMU+
Face+Images, 1997, Accessed: 2016-9-3.
[21] “Semeion handwritten digit data set,” Semeion Research
Center of Sciences of Communication, via Sersale 117,
00128 Rome, Italy Tattile Via Gaetano Donizetti, 1-3-5,25030
Mairano (Brescia), Italy.

More Related Content

What's hot

A Hybrid Approach for Ensuring Security in Data Communication
A Hybrid Approach for Ensuring Security in Data Communication A Hybrid Approach for Ensuring Security in Data Communication
A Hybrid Approach for Ensuring Security in Data Communication cscpconf
 
Cb pattern trees identifying
Cb pattern trees  identifyingCb pattern trees  identifying
Cb pattern trees identifyingIJDKP
 
Data Hiding In Medical Images by Preserving Integrity of ROI Using Semi-Rever...
Data Hiding In Medical Images by Preserving Integrity of ROI Using Semi-Rever...Data Hiding In Medical Images by Preserving Integrity of ROI Using Semi-Rever...
Data Hiding In Medical Images by Preserving Integrity of ROI Using Semi-Rever...IJERA Editor
 
Dotnet datamining ieee projects 2012 @ Seabirds ( Chennai, Pondicherry, Vello...
Dotnet datamining ieee projects 2012 @ Seabirds ( Chennai, Pondicherry, Vello...Dotnet datamining ieee projects 2012 @ Seabirds ( Chennai, Pondicherry, Vello...
Dotnet datamining ieee projects 2012 @ Seabirds ( Chennai, Pondicherry, Vello...SBGC
 
expeditions praneeth_june-2021
expeditions praneeth_june-2021expeditions praneeth_june-2021
expeditions praneeth_june-2021Praneeth Vepakomma
 
Integration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniquesIntegration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniquesiaemedu
 
Hash Visualization a New Technique to improve Real World CyberSecurity
Hash Visualization a New Technique to improve Real World CyberSecurityHash Visualization a New Technique to improve Real World CyberSecurity
Hash Visualization a New Technique to improve Real World CyberSecurityIan Beckett
 
High performance intrusion detection using modified k mean & naïve bayes
High performance intrusion detection using modified k mean & naïve bayesHigh performance intrusion detection using modified k mean & naïve bayes
High performance intrusion detection using modified k mean & naïve bayeseSAT Journals
 
Privacy Preserving Reputation Calculation in P2P Systems with Homomorphic Enc...
Privacy Preserving Reputation Calculation in P2P Systems with Homomorphic Enc...Privacy Preserving Reputation Calculation in P2P Systems with Homomorphic Enc...
Privacy Preserving Reputation Calculation in P2P Systems with Homomorphic Enc...IJCNCJournal
 
Paper id 212014109
Paper id 212014109Paper id 212014109
Paper id 212014109IJRAT
 
Privacy preserving relative location based services for mobile users
Privacy preserving relative location based services for mobile usersPrivacy preserving relative location based services for mobile users
Privacy preserving relative location based services for mobile usersLeMeniz Infotech
 
Implementation of digital image watermarking techniques using dwt and dwt svd...
Implementation of digital image watermarking techniques using dwt and dwt svd...Implementation of digital image watermarking techniques using dwt and dwt svd...
Implementation of digital image watermarking techniques using dwt and dwt svd...eSAT Journals
 
856200902 a06
856200902 a06856200902 a06
856200902 a06amosalade
 
Survey on transaction reordering
Survey on transaction reorderingSurvey on transaction reordering
Survey on transaction reorderingiaemedu
 
Indoor localization Leveraging Human Perception of Textual Signs
Indoor localization Leveraging Human Perception of Textual SignsIndoor localization Leveraging Human Perception of Textual Signs
Indoor localization Leveraging Human Perception of Textual SignsShekhar Vimalendu
 
IRJET- Quantify Mutually Dependent Privacy Risks with Locality Data
IRJET- Quantify Mutually Dependent Privacy Risks with Locality DataIRJET- Quantify Mutually Dependent Privacy Risks with Locality Data
IRJET- Quantify Mutually Dependent Privacy Risks with Locality DataIRJET Journal
 
UTILIZING XAI TECHNIQUE TO IMPROVE AUTOENCODER BASED MODEL FOR COMPUTER NETWO...
UTILIZING XAI TECHNIQUE TO IMPROVE AUTOENCODER BASED MODEL FOR COMPUTER NETWO...UTILIZING XAI TECHNIQUE TO IMPROVE AUTOENCODER BASED MODEL FOR COMPUTER NETWO...
UTILIZING XAI TECHNIQUE TO IMPROVE AUTOENCODER BASED MODEL FOR COMPUTER NETWO...IJCNCJournal
 
Text classification based on gated recurrent unit combines with support vecto...
Text classification based on gated recurrent unit combines with support vecto...Text classification based on gated recurrent unit combines with support vecto...
Text classification based on gated recurrent unit combines with support vecto...IJECEIAES
 
A novel automatic voice recognition system based on text-independent in a noi...
A novel automatic voice recognition system based on text-independent in a noi...A novel automatic voice recognition system based on text-independent in a noi...
A novel automatic voice recognition system based on text-independent in a noi...IJECEIAES
 

What's hot (20)

A Hybrid Approach for Ensuring Security in Data Communication
A Hybrid Approach for Ensuring Security in Data Communication A Hybrid Approach for Ensuring Security in Data Communication
A Hybrid Approach for Ensuring Security in Data Communication
 
Cb pattern trees identifying
Cb pattern trees  identifyingCb pattern trees  identifying
Cb pattern trees identifying
 
Data Hiding In Medical Images by Preserving Integrity of ROI Using Semi-Rever...
Data Hiding In Medical Images by Preserving Integrity of ROI Using Semi-Rever...Data Hiding In Medical Images by Preserving Integrity of ROI Using Semi-Rever...
Data Hiding In Medical Images by Preserving Integrity of ROI Using Semi-Rever...
 
Dotnet datamining ieee projects 2012 @ Seabirds ( Chennai, Pondicherry, Vello...
Dotnet datamining ieee projects 2012 @ Seabirds ( Chennai, Pondicherry, Vello...Dotnet datamining ieee projects 2012 @ Seabirds ( Chennai, Pondicherry, Vello...
Dotnet datamining ieee projects 2012 @ Seabirds ( Chennai, Pondicherry, Vello...
 
expeditions praneeth_june-2021
expeditions praneeth_june-2021expeditions praneeth_june-2021
expeditions praneeth_june-2021
 
Integration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniquesIntegration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniques
 
Hash Visualization a New Technique to improve Real World CyberSecurity
Hash Visualization a New Technique to improve Real World CyberSecurityHash Visualization a New Technique to improve Real World CyberSecurity
Hash Visualization a New Technique to improve Real World CyberSecurity
 
High performance intrusion detection using modified k mean & naïve bayes
High performance intrusion detection using modified k mean & naïve bayesHigh performance intrusion detection using modified k mean & naïve bayes
High performance intrusion detection using modified k mean & naïve bayes
 
Privacy Preserving Reputation Calculation in P2P Systems with Homomorphic Enc...
Privacy Preserving Reputation Calculation in P2P Systems with Homomorphic Enc...Privacy Preserving Reputation Calculation in P2P Systems with Homomorphic Enc...
Privacy Preserving Reputation Calculation in P2P Systems with Homomorphic Enc...
 
Paper id 212014109
Paper id 212014109Paper id 212014109
Paper id 212014109
 
Privacy preserving relative location based services for mobile users
Privacy preserving relative location based services for mobile usersPrivacy preserving relative location based services for mobile users
Privacy preserving relative location based services for mobile users
 
Implementation of digital image watermarking techniques using dwt and dwt svd...
Implementation of digital image watermarking techniques using dwt and dwt svd...Implementation of digital image watermarking techniques using dwt and dwt svd...
Implementation of digital image watermarking techniques using dwt and dwt svd...
 
856200902 a06
856200902 a06856200902 a06
856200902 a06
 
Survey on transaction reordering
Survey on transaction reorderingSurvey on transaction reordering
Survey on transaction reordering
 
Indoor localization Leveraging Human Perception of Textual Signs
Indoor localization Leveraging Human Perception of Textual SignsIndoor localization Leveraging Human Perception of Textual Signs
Indoor localization Leveraging Human Perception of Textual Signs
 
Sangeetha seminar (1)
Sangeetha  seminar (1)Sangeetha  seminar (1)
Sangeetha seminar (1)
 
IRJET- Quantify Mutually Dependent Privacy Risks with Locality Data
IRJET- Quantify Mutually Dependent Privacy Risks with Locality DataIRJET- Quantify Mutually Dependent Privacy Risks with Locality Data
IRJET- Quantify Mutually Dependent Privacy Risks with Locality Data
 
UTILIZING XAI TECHNIQUE TO IMPROVE AUTOENCODER BASED MODEL FOR COMPUTER NETWO...
UTILIZING XAI TECHNIQUE TO IMPROVE AUTOENCODER BASED MODEL FOR COMPUTER NETWO...UTILIZING XAI TECHNIQUE TO IMPROVE AUTOENCODER BASED MODEL FOR COMPUTER NETWO...
UTILIZING XAI TECHNIQUE TO IMPROVE AUTOENCODER BASED MODEL FOR COMPUTER NETWO...
 
Text classification based on gated recurrent unit combines with support vecto...
Text classification based on gated recurrent unit combines with support vecto...Text classification based on gated recurrent unit combines with support vecto...
Text classification based on gated recurrent unit combines with support vecto...
 
A novel automatic voice recognition system based on text-independent in a noi...
A novel automatic voice recognition system based on text-independent in a noi...A novel automatic voice recognition system based on text-independent in a noi...
A novel automatic voice recognition system based on text-independent in a noi...
 

Similar to Desensitized RDCA Subspaces for Compressive Privacy in Machine Learning

Using Randomized Response Techniques for Privacy-Preserving Data Mining
Using Randomized Response Techniques for Privacy-Preserving Data MiningUsing Randomized Response Techniques for Privacy-Preserving Data Mining
Using Randomized Response Techniques for Privacy-Preserving Data Mining14894
 
Privacy Preserving Clustering on Distorted data
Privacy Preserving Clustering on Distorted dataPrivacy Preserving Clustering on Distorted data
Privacy Preserving Clustering on Distorted dataIOSR Journals
 
journal for research
journal for researchjournal for research
journal for researchrikaseorika
 
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...IJDKP
 
IRJET - Random Data Perturbation Techniques in Privacy Preserving Data Mi...
IRJET -  	  Random Data Perturbation Techniques in Privacy Preserving Data Mi...IRJET -  	  Random Data Perturbation Techniques in Privacy Preserving Data Mi...
IRJET - Random Data Perturbation Techniques in Privacy Preserving Data Mi...IRJET Journal
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
Filtering of Frequency Components for Privacy Preserving Facial Recognition
Filtering of Frequency Components for Privacy Preserving Facial RecognitionFiltering of Frequency Components for Privacy Preserving Facial Recognition
Filtering of Frequency Components for Privacy Preserving Facial RecognitionArtur Filipowicz
 
Data Anonymization Process Challenges and Context Missions
Data Anonymization Process Challenges and Context MissionsData Anonymization Process Challenges and Context Missions
Data Anonymization Process Challenges and Context Missionsijdms
 
ACTOR CRITIC APPROACH BASED ANOMALY DETECTION FOR EDGE COMPUTING ENVIRONMENTS
ACTOR CRITIC APPROACH BASED ANOMALY DETECTION FOR EDGE COMPUTING ENVIRONMENTSACTOR CRITIC APPROACH BASED ANOMALY DETECTION FOR EDGE COMPUTING ENVIRONMENTS
ACTOR CRITIC APPROACH BASED ANOMALY DETECTION FOR EDGE COMPUTING ENVIRONMENTSIJCNCJournal
 
Actor Critic Approach based Anomaly Detection for Edge Computing Environments
Actor Critic Approach based Anomaly Detection for Edge Computing EnvironmentsActor Critic Approach based Anomaly Detection for Edge Computing Environments
Actor Critic Approach based Anomaly Detection for Edge Computing EnvironmentsIJCNCJournal
 
Proximity aware local-recoding anonymization with map reduce for scalable big...
Proximity aware local-recoding anonymization with map reduce for scalable big...Proximity aware local-recoding anonymization with map reduce for scalable big...
Proximity aware local-recoding anonymization with map reduce for scalable big...Nexgen Technology
 
A MATHEMATICAL MODEL OF ACCESS CONTROL IN BIG DATA USING CONFIDENCE INTERVAL ...
A MATHEMATICAL MODEL OF ACCESS CONTROL IN BIG DATA USING CONFIDENCE INTERVAL ...A MATHEMATICAL MODEL OF ACCESS CONTROL IN BIG DATA USING CONFIDENCE INTERVAL ...
A MATHEMATICAL MODEL OF ACCESS CONTROL IN BIG DATA USING CONFIDENCE INTERVAL ...cscpconf
 
A mathematical model of access control in big data using confidence interval ...
A mathematical model of access control in big data using confidence interval ...A mathematical model of access control in big data using confidence interval ...
A mathematical model of access control in big data using confidence interval ...csandit
 
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...Kato Mivule
 
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...Towards A Differential Privacy and Utility Preserving Machine Learning Classi...
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...Kato Mivule
 
Distance based transformation for privacy preserving data mining using hybrid...
Distance based transformation for privacy preserving data mining using hybrid...Distance based transformation for privacy preserving data mining using hybrid...
Distance based transformation for privacy preserving data mining using hybrid...csandit
 
PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...
PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...
PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...cscpconf
 
Ieeepro techno solutions ieee java project - generalized approach for data
Ieeepro techno solutions  ieee java project - generalized approach for dataIeeepro techno solutions  ieee java project - generalized approach for data
Ieeepro techno solutions ieee java project - generalized approach for datahemanthbbc
 

Similar to Desensitized RDCA Subspaces for Compressive Privacy in Machine Learning (20)

Using Randomized Response Techniques for Privacy-Preserving Data Mining
Using Randomized Response Techniques for Privacy-Preserving Data MiningUsing Randomized Response Techniques for Privacy-Preserving Data Mining
Using Randomized Response Techniques for Privacy-Preserving Data Mining
 
Privacy Preserving Clustering on Distorted data
Privacy Preserving Clustering on Distorted dataPrivacy Preserving Clustering on Distorted data
Privacy Preserving Clustering on Distorted data
 
journal for research
journal for researchjournal for research
journal for research
 
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...
 
IRJET - Random Data Perturbation Techniques in Privacy Preserving Data Mi...
IRJET -  	  Random Data Perturbation Techniques in Privacy Preserving Data Mi...IRJET -  	  Random Data Perturbation Techniques in Privacy Preserving Data Mi...
IRJET - Random Data Perturbation Techniques in Privacy Preserving Data Mi...
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Filtering of Frequency Components for Privacy Preserving Facial Recognition
Filtering of Frequency Components for Privacy Preserving Facial RecognitionFiltering of Frequency Components for Privacy Preserving Facial Recognition
Filtering of Frequency Components for Privacy Preserving Facial Recognition
 
Data Anonymization Process Challenges and Context Missions
Data Anonymization Process Challenges and Context MissionsData Anonymization Process Challenges and Context Missions
Data Anonymization Process Challenges and Context Missions
 
ACTOR CRITIC APPROACH BASED ANOMALY DETECTION FOR EDGE COMPUTING ENVIRONMENTS
ACTOR CRITIC APPROACH BASED ANOMALY DETECTION FOR EDGE COMPUTING ENVIRONMENTSACTOR CRITIC APPROACH BASED ANOMALY DETECTION FOR EDGE COMPUTING ENVIRONMENTS
ACTOR CRITIC APPROACH BASED ANOMALY DETECTION FOR EDGE COMPUTING ENVIRONMENTS
 
Actor Critic Approach based Anomaly Detection for Edge Computing Environments
Actor Critic Approach based Anomaly Detection for Edge Computing EnvironmentsActor Critic Approach based Anomaly Detection for Edge Computing Environments
Actor Critic Approach based Anomaly Detection for Edge Computing Environments
 
Proximity aware local-recoding anonymization with map reduce for scalable big...
Proximity aware local-recoding anonymization with map reduce for scalable big...Proximity aware local-recoding anonymization with map reduce for scalable big...
Proximity aware local-recoding anonymization with map reduce for scalable big...
 
A MATHEMATICAL MODEL OF ACCESS CONTROL IN BIG DATA USING CONFIDENCE INTERVAL ...
A MATHEMATICAL MODEL OF ACCESS CONTROL IN BIG DATA USING CONFIDENCE INTERVAL ...A MATHEMATICAL MODEL OF ACCESS CONTROL IN BIG DATA USING CONFIDENCE INTERVAL ...
A MATHEMATICAL MODEL OF ACCESS CONTROL IN BIG DATA USING CONFIDENCE INTERVAL ...
 
A mathematical model of access control in big data using confidence interval ...
A mathematical model of access control in big data using confidence interval ...A mathematical model of access control in big data using confidence interval ...
A mathematical model of access control in big data using confidence interval ...
 
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
 
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...Towards A Differential Privacy and Utility Preserving Machine Learning Classi...
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...
 
winbis1005
winbis1005winbis1005
winbis1005
 
Distance based transformation for privacy preserving data mining using hybrid...
Distance based transformation for privacy preserving data mining using hybrid...Distance based transformation for privacy preserving data mining using hybrid...
Distance based transformation for privacy preserving data mining using hybrid...
 
PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...
PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...
PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...
 
Data attribute security and privacy in Collaborative distributed database Pub...
Data attribute security and privacy in Collaborative distributed database Pub...Data attribute security and privacy in Collaborative distributed database Pub...
Data attribute security and privacy in Collaborative distributed database Pub...
 
Ieeepro techno solutions ieee java project - generalized approach for data
Ieeepro techno solutions  ieee java project - generalized approach for dataIeeepro techno solutions  ieee java project - generalized approach for data
Ieeepro techno solutions ieee java project - generalized approach for data
 

More from Artur Filipowicz

Smart Safety for Commercial Vehicles (ENG)
Smart Safety for Commercial Vehicles (ENG)Smart Safety for Commercial Vehicles (ENG)
Smart Safety for Commercial Vehicles (ENG)Artur Filipowicz
 
Smart Safety for Commercial Vehicles (中文)
Smart Safety for Commercial Vehicles (中文)Smart Safety for Commercial Vehicles (中文)
Smart Safety for Commercial Vehicles (中文)Artur Filipowicz
 
Incorporating Learning Strategies in Training of Deep Neural Networks for Au...
Incorporating Learning Strategies in Training of Deep Neural  Networks for Au...Incorporating Learning Strategies in Training of Deep Neural  Networks for Au...
Incorporating Learning Strategies in Training of Deep Neural Networks for Au...Artur Filipowicz
 
Learning to Recognize Distance to Stop Signs Using the Virtual World of Grand...
Learning to Recognize Distance to Stop Signs Using the Virtual World of Grand...Learning to Recognize Distance to Stop Signs Using the Virtual World of Grand...
Learning to Recognize Distance to Stop Signs Using the Virtual World of Grand...Artur Filipowicz
 
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...Artur Filipowicz
 
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...Artur Filipowicz
 
Direct Perception for Congestion Scene Detection Using TensorFlow
Direct Perception for Congestion Scene Detection Using TensorFlowDirect Perception for Congestion Scene Detection Using TensorFlow
Direct Perception for Congestion Scene Detection Using TensorFlowArtur Filipowicz
 
Learning to Recognize Distance to Stop Signs Using the Virtual World of Grand...
Learning to Recognize Distance to Stop Signs Using the Virtual World of Grand...Learning to Recognize Distance to Stop Signs Using the Virtual World of Grand...
Learning to Recognize Distance to Stop Signs Using the Virtual World of Grand...Artur Filipowicz
 
Video Games for Autonomous Driving
Video Games for Autonomous DrivingVideo Games for Autonomous Driving
Video Games for Autonomous DrivingArtur Filipowicz
 

More from Artur Filipowicz (9)

Smart Safety for Commercial Vehicles (ENG)
Smart Safety for Commercial Vehicles (ENG)Smart Safety for Commercial Vehicles (ENG)
Smart Safety for Commercial Vehicles (ENG)
 
Smart Safety for Commercial Vehicles (中文)
Smart Safety for Commercial Vehicles (中文)Smart Safety for Commercial Vehicles (中文)
Smart Safety for Commercial Vehicles (中文)
 
Incorporating Learning Strategies in Training of Deep Neural Networks for Au...
Incorporating Learning Strategies in Training of Deep Neural  Networks for Au...Incorporating Learning Strategies in Training of Deep Neural  Networks for Au...
Incorporating Learning Strategies in Training of Deep Neural Networks for Au...
 
Learning to Recognize Distance to Stop Signs Using the Virtual World of Grand...
Learning to Recognize Distance to Stop Signs Using the Virtual World of Grand...Learning to Recognize Distance to Stop Signs Using the Virtual World of Grand...
Learning to Recognize Distance to Stop Signs Using the Virtual World of Grand...
 
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...
 
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...
Virtual Environments as Driving Schools for Deep Learning Vision-Based Sensor...
 
Direct Perception for Congestion Scene Detection Using TensorFlow
Direct Perception for Congestion Scene Detection Using TensorFlowDirect Perception for Congestion Scene Detection Using TensorFlow
Direct Perception for Congestion Scene Detection Using TensorFlow
 
Learning to Recognize Distance to Stop Signs Using the Virtual World of Grand...
Learning to Recognize Distance to Stop Signs Using the Virtual World of Grand...Learning to Recognize Distance to Stop Signs Using the Virtual World of Grand...
Learning to Recognize Distance to Stop Signs Using the Virtual World of Grand...
 
Video Games for Autonomous Driving
Video Games for Autonomous DrivingVideo Games for Autonomous Driving
Video Games for Autonomous Driving
 

Recently uploaded

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governanceWSO2
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxMarkSteadman7
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)Samir Dash
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingWSO2
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuidePixlogix Infotech
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaWSO2
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityVictorSzoltysek
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....rightmanforbloodline
 

Recently uploaded (20)

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation Computing
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using Ballerina
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
 

Desensitized RDCA Subspaces for Compressive Privacy in Machine Learning

  • 1. DESENSITIZED RDCA SUBSPACES FOR COMPRESSIVE PRIVACY IN MACHINE LEARNING Artur Filipowicz Thee Chanyaswad S.Y. Kung Princeton University Princeton, NJ ABSTRACT The quest for better data analysis and artificial intelligence has lead to more and more data being collected and stored. As a consequence, more data are exposed to malicious entities. This paper examines the problem of privacy in machine learning for classification. We utilize the Ridge Discriminant Component Analysis (RDCA) to desensitize data with respect to a privacy label. Based on five experiments, we show that desensitization by RDCA can effectively protect privacy (i.e. low accuracy on the privacy label) with small loss in utility. On HAR and CMU Faces datasets, the use of desensitized data re- sults in random guess level accuracies for privacy at a cost of 5.14% and 0.04%, on average, drop in the utility accuracies. For Semeion Handwritten Digit dataset, accuracies of the privacy-sensitive digits are almost zero, while the accuracies for the utility-relevant digits drop by 7.53% on average. This presents a promising solution to the problem of privacy in machine learning for classification. Index Terms— Compressive Privacy, Ridge Discriminant Com- ponent Analysis (RDCA), Privacy-preserving Data Mining/Machine Learning, Data Desensitization, Dimension Reduction 1. INTRODUCTION Innovation in the 21st Century electronics centers around data pro- cessing. Progress is fueled by the symbiotic relationship between big data and machine learning in which machine learning allows us to interpret big data and big data allows us to train large machine learning models. In the world of big data, videos, photos, emails, banking transactions, browsing history, GPS tracks, and other per- sonal data are continuously collected and stored by organizations for analysis. These data may be circulated around the Internet without the data owner’s knowledge and be at risk of exposure to malicious entities. A few recent data leakages are described by [1], and many other possible attacks on privacy have been reported or proposed [2, 3, 4, 5]. The complete problem of maintaining privacy is complex. It is distributed temporally since data owner’s present and past actions can compromise privacy. It is distributed spatially as the data owner has personal information in multiple accounts, devices, and physi- cal locations. Our focus is privacy protection in the context of ma- chine learning for classification, at the time and location the data owner, the user, submits his/her data to a machine learning service, the server. Thanks to the Brandeis Program of the Defense Advanced Research Project Agency (DARPA) and Space and Naval Warfare System Center Pa- cific (SSC Pacific) under Contract No. 66001-15-C-4068 for funding support, and to Professor J. Morris Chang from Iowa State University for invaluable discussion and assistances. A classical solution to this problem is encryption. The user en- crypts his/her information before submission, and the server decodes the submitted data. However, the server may leak these data to mali- cious entities. Therefore, it should not be trusted and should not re- ceive information which compromises privacy. In machine learning for classification, this is information which maximizes the classifica- tion accuracy of the utility label while minimizing the classification accuracy of the privacy label. Several different ideas have been pro- posed to attack this problem such as noisy data reconstruction [6, 7], rotational and random perturbation [8, 9], microaggregation of data [10], privacy-centric classifier designs [11, 12, 13], etc. Our method is based upon Compressive Privacy [14, 15, 16] ap- proach to this problem. It utilizes the concept of data desensitization – modifying the data by reducing the number of features such that the privacy is protected. We employ Ridge Discriminant Compo- nent Analysis (RDCA) [17, 18, 15] to desensitize data before they are submitted to a server. By deriving the RDCA components with respect to the privacy label, two subspaces are attained – the privacy signal and privacy noise subspaces. The privacy noise subspace is the subspace which has minimal classification power with respect to the privacy label. Therefore, the proposed method utilizes this pri- vacy noise subspace to project the data onto in order to desensitize the data. Even if the sever leaks the desensitized data, a malicious entity cannot use these data to classify the user under the privacy label. Based on the properties of the utility and privacy labels, we de- fine three problem classes – Common-Unique Privacy, Common- Common Privacy and Split-Label Privacy problems. To test the po- tency of our method on our three different classes, we present an ex- ample dataset – HAR, CMU Faces, and Semeion Handwritten Digit – for each class and show that desensitization by RDCA can effec- tively protect privacy by reducing the privacy accuracy to the ran- dom guess level in the HAR and CMU Faces datasets, and to almost zero on the privacy-sensitive digits in the Semeion Handwritten Digit dataset. On the other hand, the utility accuracies only drop by 5.14% in the HAR dataset, on average by 0.04% in the CMU Faces dataset, and on average by 7.53% on the utility-relevant digits in the Semeion Handwritten Digit dataset. This confirms that the proposed desensi- tization method by RDCA is promising in providing a solution in privacy-preserving machine learning. 2. PRIVACY PROBLEM CLASSES In a standard classification problem, for a given set of supervised training data of N samples and M features, {X, y}, a classifier is trained to predict y based on X. When considering privacy, we de- fine a privacy label y(p) and a utility label y(u) . Our objective is then to minimize the possibility of y(p) being predicted based on X
  • 2. Fig. 1. Illustration for the RDCA signal and noise subspaces with M = 2 and L = 2. In the signal subspace (left), the distance be- tween class centroids is high, where as in the noise subspace (right), the distance between class centroids is low relative to the distances among all of the samples. while maximizing our classifier’s ability to predict y(u) . Based on this approach, we define three classes of problems. To define these problem classes, we first introduce the ideas of a common and unique label. A unique label has a different class for each user, for example social security number. A common label has classes which are shared by multiple users, eye color being an example. The three problems we define are Common-Unique Privacy, Common-Common Privacy, and Split-Label Privacy. • In Common-Unique Privacy problem, y(u) or y(p) is a com- mon label while the other label is unique. An application example we explore is human activity recognition. In this ex- ample, the unique label is the identity of the user submitting activity data, while the common label is the type of activity (walking, running, etc.) performed by the user. • In Common-Common Privacy problem, both y(u) and y(p) are common labels. An example we examine is facial feature recognition. Specifically, for each image, there are two com- mon labels indicating if the user is wearing sunglasses and his pose. • Lastly, in Split-Label Privacy problem, y(u) and y(p) are de- rived from a single label y where some classes are grouped. Our example is Optical Character Recognition where we wish to recognize digits 0 to 4 while protecting digits 5 to 9 from being recognized. The digits may be responses on a survey about marriage where 0 to 4 represents single-never- married, single-divorced, single-widowed etc. and digits 5 to 9 represent married-male-female, married-male-male, etc. In this case, the response between 5-9 may leak private infor- mation about sexual orientation, and therefore should not be uniquely identifiable, unlike the utility digits 0-4, of which unique identifiability may be useful for marketing purposes. 3. DESENSITIZATION BY RDCA SUBSPACE PROJECTION 3.1. Ridge Discriminant Component Analysis (RDCA) Ridge Discriminant Component Analysis (RDCA) [17, 18, 15] aims at finding the subspace that maximizes the discriminant distance among the classes. Given an L-class classification problem, RDCA is able to provide the (L − 1)-dimensional subspace where all dis- criminant power lies, and the remaining subspace where no discrim- inant power remains. Conceptually, RDCA aims at maximizing the Fig. 2. Illustration of the data subspace desensitization process. First, the signal and noise subspaces spanned by the RDCA com- ponents are derived from the data and the privacy label (top). Then, the data are projected onto the privacy noise subspace to obtain the desensitized data (bottom). ratio between between-class scattering (signal) and total scattering (total power). Hence, the noise subspace from RDCA corresponds to the subspace where the distance scattering among samples is compa- rable to the distance scattering among centroids of the classes. This phenomenon is illustrated in Figure 1. For the mathematical discus- sion and derivation of RDCA, we refer the readers to [17, 18, 15]. The essential property that is relevant to this proposed work is the fact that RDCA has the capability to provide the signal and noise subspaces with respect to a label. Given the discriminant compo- nents derived from RDCA, {w1, w2, . . . , wL−1, wL, . . . , wM ; wi ∈ RM }, as ordered by the decreasing discriminant power, the signal and noise subspaces are therefore defined as span({w1, w2, . . . , wL−1}), and span({wL, wL+1, . . . , wM }), respectively. 3.2. Desensitized Subspace and Desensitized Data As RDCA can separate the signal and noise subspaces with respect to a label, it lends itself nicely to the application of data desensitiza- tion. By using the privacy label y(p) to train RDCA, the privacy sig- nal subspace, S (p) S = span({w (p) 1 , w (p) 2 , . . . , w (p) L−1}), and the pri- vacy noise subspace, S (p) N = span({w (p) L , w (p) L+1, . . . , w (p) M }), can be derived. Thus, by using only the privacy noise subspace for the submitted data, the discriminant power of the privacy classes can be minimized. Appropriately, the privacy noise subspace is called the desensitized subspace, and the data projected onto this subspace are referred to as the desensitized data. The procedure for producing the desensitized data is summarized in Figure 2. 4. EXPERIMENTS To provide real world examples of the three privacy problem classes and show that RDCA can be used to protect privacy, we conducted experiments on the Human Activity Recognition Using Smart- phones, CMU Faces, and Semeion Handwritten Digit datasets. For all experiments, SVM is used as the classifier for both utility and privacy, and in the training phase, cross-validation is used to tune the parameters. 4.1. HAR Human Activity Recognition Using Smartphones (HAR) dataset [19] aims at using mobile sensor signals (accelerometer and gyro- scope) to predict activity being performed. The feature size of the
  • 3. Label Random Guess Before Desensitization After Desensitization Activity (Utility) 16.67% 97.62% 92.48% Person Identification (Privacy) 5.26% 69.67% 7.02% Table 1. Results from the HAR dataset. dataset is 561. The data are collected from 19 individuals perform- ing six activities. The dataset consists of 5379 samples for training and 798 samples left out for testing. The activity is defined to be the utility, y(u) , whereas the person identification is defined to be the privacy, y(p) . 4.2. CMU Faces CMU Faces dataset contains 640 grayscale images of 20 individuals [20]. For each individual there is an image for every combination of pose (straight, left, right, up), expression (neutral, happy, sad, angry), and sunglasses (present or not). Images of size 32 by 30 pixels are used. Two experiments are performed: • In the first experiment, the utility, y(u) , is defined to be the pose and the privacy, y(p) , is the sunglasses indicator. • In the second experiment, the utility, y(u) , is defined to be the sunglasses indicator and the privacy, y(p) , is the pose. 4.3. Semeion Handwritten Digit The Semeion Handwritten Digit dataset contains 1593 handwritten digits from around 80 individuals. Every individual wrote each digit from 0 to 9 twice. The samples were scanned to a 16x16 pixel grayscale image. Each pixel was then thresholded to a Boolean value [21]. For experiments on this dataset, y(u) and y(p) are construct by grouping digits in the following ways: • In the first experiment, the objective is to recognize digits 0 to 4 and protect digits 5 to 9. Utility, y(u) , is equal to 0 to 4 if the image is of such digit and it is equal to 5 if the image is of 5, 6, 7, 8 or 9. Similarly, privacy label, y(p) , is equal to 5 to 9 if the image is of such digit and it is equal to 0 otherwise. • For the second experiment, the values of the y(u) and y(p) are swapped. 5. RESULTS For all results, three accuracies are reported for comparison. The random guess is the accuracy when no training is performed and the prediction, hence, is made based on the frequency of the class in the dataset. The accuracy before desensitization is resulted from the pre- diction using full dimension of RDCA for the corresponding label, along with the classifier. Finally, the accuracy after desensitization is resulted from using the classifier on the desensitized data. Table 1 reports the classification results on the HAR dataset. Ta- ble 2 reports the results from both experiments on the CMU Faces dataset. Finally, Table 3 and Table 4 report the results from the two experiments on the Semeion Handwritten Digit dataset. Experiment I Label Random Guess Before Desensitization After Desensitization Pose (Utility) 25.00% 83.30% 83.25% Glasses (Privacy) 50.00% 86.00% 50.47% Experiment II Label Random Guess Before Desensitization After Desensitization Glasses (Utility) 50.00% 86.00% 85.97% Pose (Privacy) 25.00% 83.30% 25.00% Table 2. Results from the two experiments on the CMU Faces dataset. Experiment I Utility: 0-4 Digit Random Guess Before Desensitization After Desensitization 0 10.0% 95.61% 92.86% 1 10.0% 84.10% 73.71% 2 10.0% 86.30% 80.80% 3 10.0% 76.70% 72.50% 4 10.0% 83.14% 75.33% The Rest 50.0% 94.67% 92.46% Privacy: 5-9 Digit Random Guess Before Desensitization After Desensitization 5 10.0% 90.16% 0.00% 6 10.0% 82.40% 0.00% 7 10.0% 85.24% 0.00% 8 10.0% 83.50% 0.00% 9 10.0% 88.30% 0.00% The Rest 50.0% 68.90% 99.86% Table 3. Results from the first experiment on the Semeion Hand- written Digit dataset, when the digits 0-4 are defined as the utility, whereas the digits 5-9 are defined as the privacy.
  • 4. Experiment II Utility: 5-9 Digit Random Guess Before Desensitization After Desensitization 5 10.0% 90.16% 75.20% 6 10.0% 82.40% 80.20% 7 10.0% 85.24% 81.70% 8 10.0% 83.50% 82.90% 9 10.0% 88.30% 64.90% The Rest 50.0% 68.90% 86.50% Privacy: 0-4 Digit Random Guess Before Desensitization After Desensitization 0 10.0% 95.61% 0.01% 1 10.0% 84.10% 0.01% 2 10.0% 86.30% 0.00% 3 10.0% 76.70% 0.02% 4 10.0% 83.14% 0.01% The Rest 50.0% 94.67% 100.00% Table 4. Results from the second experiment on the Semeion Hand- written Digit dataset, when the digits 5-9 are defined as the utility, whereas the digits 0-4 are defined as the privacy. 6. DISCUSSION 6.1. The Effects of Desensitization on Privacy and Utility Five experiments on the three datasets indicate that desensitization by RDCA can effectively protect privacy with respect to the privacy label. The privacy accuracies drop to the random guess level in both HAR and CMU Faces datasets, while the privacy accuracies of the privacy-sensitive digits are almost zero in the Semeion Handwritten Digit dataset. Note that the reason the privacy accuracies approach zero for the privacy-sensitive digits is because the classifier predicts most samples to be in the “don’t care”, “The Rest”, class, which is desirable for privacy under the scenario considered. On the other hand, desensitization does not attenuate the utility as significantly. It reduces the utility accuracies of the HAR exper- iment and both experiments on CMU Faces by only 5.14%, 0.05%, and 0.03%, respectively. On the Semeion Handwritten Digit exper- iments, the utility accuracies also only drop by 7.53% on average across all utility-relevant digits. This shows that desensitization can be a viable tool in effectively protecting privacy, while still providing good utility. 6.2. Unique-Unique Privacy Problem One other variant of the privacy problem is Unique-Unique Privacy problem. However, because all unique labels are surrogates for iden- tity, and differ in name only, when the objective is to protect a unique label while trying to predict another unique label, the problem is a contradiction. 6.3. Future Works RDCA approach to privacy should be extended to include regression. This extension would be useful in a case where the utility is predict- ing how much someone would be willing to spend on a house while privacy is his savings account balance. Another useful extension is making this method applicable to cases with multiple utility and pri- vacy labels. For example predicting favorite activity and food while protecting citizenship status and political affiliation. With those two extensions, it would be interesting to try making all variables in the dataset private except for the utility label or labels. Then it may be possible to have a machine learning service in which the desensitized data the user is submitting cannot be used to learn the original data. 7. CONCLUSION We defined three privacy problem classes in machine learning for classification, in which the common goal is to maximize the clas- sification accuracy of the utility label, y(u) , while minimizing the classification accuracy of the privacy label, y(p) . Common-Unique Privacy problems have one label which is unique to each user. Common-Common Privacy problems have both labels which are not unique to users. Split-Label Privacy problems have y(u) and y(p) derived from a single label y where some classes are grouped. Based on five experiments, we show that data desensitization by RDCA can effectively protect privacy across all three problem classes. On HAR and CMU Faces datasets, the use of desensitized data results in random guess level accuracies for privacy label at a cost of 5.14% and 0.04%, on average, drop in accuracy on the util- ity label. For Semeion Handwritten Digit dataset, accuracies of the privacy-sensitive digits are almost zero and the accuracies for utility- relevant digits drop by 7.53% on average. In all experiments, the tradeoffs between privacy and utility maybe acceptable and warrant a further exploration and development of this method. Acknowledgement This material is based upon work supported in part by the Bran- deis Program of the Defense Advanced Research Project Agency (DARPA) and Space and Naval Warfare System Center Pacific (SSC Pacific) under Contract No. 66001-15-C-4068. The authors wish to thank Professor J. Morris Chang from Iowa State University for invaluable discussion and assistances. 8. REFERENCES [1] Privacy Rights Clearinghouse, “Chronology of data breaches,” http://www.privacyrights.org/data-breach, 2016, Accessed: 2016-08-17. [2] Arvind Narayanan and Vitaly Shmatikov, “How to break anonymity of the netflix prize dataset,” CoRR, vol. abs/cs/0610105, 2006.
  • 5. [3] Arvind Narayanan, Hristo Paskov, Neil Zhenqiang Gong, John Bethencourt, Emil Stefanov, Eui Chul Richard Shin, and Dawn Song, “On the feasibility of internet-scale author identifica- tion,” in 2012 IEEE Symposium on Security and Privacy. 2012, pp. 300–314, IEEE. [4] Joseph A. Calandrino, Ann Kilzer, Arvind Narayanan, Ed- ward W. Felten, and Vitaly Shmatikov, “” you might also like:” privacy risks of collaborative filtering,” in 2011 IEEE Sympo- sium on Security and Privacy. 2011, pp. 231–246, IEEE. [5] Michael Barbaro and Tom Zeller Jr., “A face is exposed for aol searcher no. 4417749,” http://www.nytimes.com/ 2006/08/09/technology/09aol.html, Aug 9, 2006 2006. [6] Rakesh Agrawal and Ramakrishnan Srikant, “Privacy- preserving data mining,” in ACM Sigmod Record. 2000, vol. 29, pp. 439–450, ACM. [7] Dakshi Agrawal and Charu C. Aggarwal, “On the design and quantification of privacy preserving data mining algo- rithms,” in Proceedings of the twentieth ACM SIGMOD- SIGACT-SIGART symposium on Principles of database sys- tems. 2001, pp. 247–255, ACM. [8] Keke Chen and Ling Liu, “Privacy preserving data classifica- tion with rotation perturbation,” in Data Mining, Fifth IEEE International Conference on. 2005, p. 4 pp., IEEE. [9] Kun Liu, Hillol Kargupta, and Jessica Ryan, “Random projection-based multiplicative data perturbation for privacy preserving distributed data mining,” Knowledge and Data En- gineering, IEEE Transactions on, vol. 18, no. 1, pp. 92–106, 2006. [10] Nico Schlitter and Jrg Lssig, “Distributed privacy preserving classification based on local cluster identifiers,” in Trust, Se- curity and Privacy in Computing and Communications (Trust- Com), 2012 IEEE 11th International Conference on. 2012, pp. 1265–1272, IEEE. [11] Hwanjo Yu, Xiaoqian Jiang, and Jaideep Vaidya, “Privacy- preserving svm using nonlinear kernels on horizontally parti- tioned data,” in Proceedings of the 2006 ACM symposium on Applied computing. 2006, pp. 603–610, ACM. [12] Keng-Pei Lin and Ming-Syan Chen, “On the design and anal- ysis of the privacy-preserving svm classifier,” Knowledge and Data Engineering, IEEE Transactions on, vol. 23, no. 11, pp. 1704–1717, 2011. [13] Valeria Nikolaenko, Udi Weinsberg, Sotiris Ioannidis, Marc Joye, Dan Boneh, and Nina Taft, “Privacy-preserving ridge regression on hundreds of millions of records,” in Security and Privacy (SP), 2013 IEEE Symposium on. 2013, pp. 334–348, IEEE. [14] S. Y. Kung, “Compressive privacy: From informa- tion/estimation theory to machine learning,” to appear on IEEE Signal Processing Magazine, submitted 2016, 2016. [15] S. Y. Kung, Thee Chanyaswad, J. Morris Chang, and Peiyuan Wu, “Collaborative pca/dca learning methods for compres- sive privacy,” to appear on ACM Transactions on Embed- ded Computing Systems, Special Issue on Effective Divide-and- Conquer, Incremental, or Distributed Mechanisms of Embed- ded Designs for Extremely Big Data in Large-Scale Devices, submitted 2016, 2016. [16] Thee Chanyaswad, J. Morris Chang, Prateek Mittal, and S. Y. Kung, “Discriminant-component eigenfaces for privacy- preserving face recognition,” in to appear on MLSP: Machine Learning for Signal Processing, submitted 2016, 2016. [17] Sun-Yuan Kung, “Discriminant component analysis for pri- vacy protection and visualization of big data,” Multimedia Tools and Applications, pp. 1–36, 2015. [18] S. Y. Kung, Kernel Methods and Machine Learning, Cam- bridge University Press, Cambridge, UK, 2014. [19] Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra, and Jorge Luis Reyes-Ortiz, “A public domain dataset for human activity recognition using smartphones.,” in ESANN, 2013. [20] Tom Mitchell, “Cmu face images data set,” https: //archive.ics.uci.edu/ml/datasets/CMU+ Face+Images, 1997, Accessed: 2016-9-3. [21] “Semeion handwritten digit data set,” Semeion Research Center of Sciences of Communication, via Sersale 117, 00128 Rome, Italy Tattile Via Gaetano Donizetti, 1-3-5,25030 Mairano (Brescia), Italy.