This document proposes a privacy-preserving algorithm for backpropagation neural network learning when the training data is arbitrarily partitioned between two parties. Existing approaches only address vertically or horizontally partitioned data. The proposed algorithm keeps each party's data private during training, revealing only the final learned weights. It aims to be efficient in computation and communication overhead while providing strong privacy guarantees. The algorithm uses secure scalar product and techniques from previous work on vertically partitioned data to perform training without either party learning about the other's data.
Data mining is utilized to manage huge measure of information which are put in the data ware houses and databases, to discover required information and data. Numerous data mining systems have been proposed, for example, association rules, decision trees, neural systems, clustering, and so on. It has turned into the purpose of consideration from numerous years. A re-known amongst the available data mining strategies is clustering of the dataset. It is the most effective data mining method. It groups the dataset in number of clusters based on certain guidelines that are predefined. It is dependable to discover the connection between the distinctive characteristics of data.
In k-mean clustering algorithm, the function is being selected on the basis of the relevancy of the function for predicting the data and also the Euclidian distance between the centroid of any cluster and the data objects outside the cluster is being computed for the clustering the data points. In this work, author enhanced the Euclidian distance formula to increase the cluster quality.
The problem of accuracy and redundancy of the dissimilar points in the clusters remains in the improved k-means for which new enhanced approach is been proposed which uses the similarity function for checking the similarity level of the point before including it to the cluster.
This document summarizes a research paper about privacy preserving data mining using implicit function theorem. The paper proposes a new approach for transforming sensitive data obtained from data mining systems into secure values. First, original data values are transformed into partial derivatives of vector-valued functions to perturb the data. Second, a symmetric key is generated from the Jacobian matrix eigenvalues for secure computation. The approach is intended to allow sharing of sensitive knowledge extracted from data mining in a private manner. An example using academic data is provided to illustrate converting data into vector functions. Results demonstrating the new approach are also presented.
PRIVACY PRESERVING DATA MINING BY USING IMPLICIT FUNCTION THEOREMIJNSA Journal
Data mining has made broad significant multidisciplinary field used in vast application domains and extracts knowledge by identifying structural relationship among the objects in large data bases. Privacy preserving data mining is a new area of data mining research for providing privacy of sensitive knowledge of information extracted from data mining system to be shared by the intended persons not to everyone to access. In this paper , we proposed a new approach of privacy preserving data mining by using implicit function theorem for secure transformation of sensitive data obtained from data mining system. we proposed two way enhanced security approach. First transforming original values of sensitive data into different partial derivatives of functional values for perturbation of data. secondly generating symmetric key value by Eigen values of jacobian matrix for secure computation. we given an example of academic sensitive data converting into vector valued functions to explain about our proposed concept and presented implementation based results of new proposed of approach.
A Review on Privacy Preservation in Data Miningijujournal
The main focus of privacy preserving data publishing was to enhance traditional data mining techniques
for masking sensitive information through data modification. The major issues were how to modify the data
and how to recover the data mining result from the altered data. The reports were often tightly coupled
with the data mining algorithms under consideration. Privacy preserving data publishing focuses on
techniques for publishing data, not techniques for data mining. In case, it is expected that standard data
mining techniques are applied on the published data. Anonymization of the data is done by hiding the
identity of record owners, whereas privacy preserving data mining seeks to directly belie the sensitive data.
This survey carries out the various privacy preservation techniques and algorithms.
A review on privacy preservation in data miningijujournal
The main focus of privacy preserving data publishing was to enhance traditional data mining techniques for masking sensitive information through data modification. The major issues were how to modify the data and how to recover the data mining result from the altered data. The reports were often tightly coupled with the data mining algorithms under consideration. Privacy preserving data publishing focuses on techniques for publishing data, not techniques for data mining. In case, it is expected that standard data mining techniques are applied on the published data. Anonymization of the data is done by hiding the identity of record owners, whereas privacy preserving data mining seeks to directly belie the sensitive data. This survey carries out the various privacy preservation techniques and algorithms.
Additive gaussian noise based data perturbation in multi level trust privacy ...IJDKP
This document discusses a technique called additive Gaussian noise based data perturbation for privacy preserving data mining. The technique introduces multiple perturbed copies of data for different trust levels of data miners to prevent diversity attacks. Gaussian noise is added to the original data and correlated between copies so that combining copies does not provide additional information about the original data. The goal is to limit what information adversaries can learn from individual or combined copies to within what the data owner intends to share, while still allowing accurate data mining. Experiments on banking customer data show the approach controls the normalized estimation error from individual and combined copies.
An Investigation of Data Privacy and Utility Preservation Using KNN Classific...Kato Mivule
Kato Mivule and Claude Turner, An Investigation of Data Privacy and Utility Preservation Using KNN Classification as a Gauge, International Conference on Information and Knowledge Engineering (IKE 2013), July 22-25, Pages 203-204, Las Vegas, NV, USA
Information Upload and retrieval using SP Theory of IntelligenceINFOGAIN PUBLICATION
In today’s technology Cloud computing has become an important aspect and storing of data on cloud is of high importance as the need for virtual space to store massive amount of data has grown during the years. However time taken for uploading and downloading is limited by processing time and thus need arises to solve this issue to handle large data and their processing. Another common problem is de duplication. With the cloud services growing at a rapid rate it is also associated by increasing large volumes of data being stored on remote servers of cloud. But most of the remote stored files are duplicated because of uploading the same file by different users at different locations. A recent survey by EMC says about 75% of the digital data present on cloud are duplicate copies. To overcome these two problems in this paper we are using SP theory of intelligence using lossless compression of information, which makes the big data smaller and thus reduces the problems in storage and management of large amounts of data.
Data mining is utilized to manage huge measure of information which are put in the data ware houses and databases, to discover required information and data. Numerous data mining systems have been proposed, for example, association rules, decision trees, neural systems, clustering, and so on. It has turned into the purpose of consideration from numerous years. A re-known amongst the available data mining strategies is clustering of the dataset. It is the most effective data mining method. It groups the dataset in number of clusters based on certain guidelines that are predefined. It is dependable to discover the connection between the distinctive characteristics of data.
In k-mean clustering algorithm, the function is being selected on the basis of the relevancy of the function for predicting the data and also the Euclidian distance between the centroid of any cluster and the data objects outside the cluster is being computed for the clustering the data points. In this work, author enhanced the Euclidian distance formula to increase the cluster quality.
The problem of accuracy and redundancy of the dissimilar points in the clusters remains in the improved k-means for which new enhanced approach is been proposed which uses the similarity function for checking the similarity level of the point before including it to the cluster.
This document summarizes a research paper about privacy preserving data mining using implicit function theorem. The paper proposes a new approach for transforming sensitive data obtained from data mining systems into secure values. First, original data values are transformed into partial derivatives of vector-valued functions to perturb the data. Second, a symmetric key is generated from the Jacobian matrix eigenvalues for secure computation. The approach is intended to allow sharing of sensitive knowledge extracted from data mining in a private manner. An example using academic data is provided to illustrate converting data into vector functions. Results demonstrating the new approach are also presented.
PRIVACY PRESERVING DATA MINING BY USING IMPLICIT FUNCTION THEOREMIJNSA Journal
Data mining has made broad significant multidisciplinary field used in vast application domains and extracts knowledge by identifying structural relationship among the objects in large data bases. Privacy preserving data mining is a new area of data mining research for providing privacy of sensitive knowledge of information extracted from data mining system to be shared by the intended persons not to everyone to access. In this paper , we proposed a new approach of privacy preserving data mining by using implicit function theorem for secure transformation of sensitive data obtained from data mining system. we proposed two way enhanced security approach. First transforming original values of sensitive data into different partial derivatives of functional values for perturbation of data. secondly generating symmetric key value by Eigen values of jacobian matrix for secure computation. we given an example of academic sensitive data converting into vector valued functions to explain about our proposed concept and presented implementation based results of new proposed of approach.
A Review on Privacy Preservation in Data Miningijujournal
The main focus of privacy preserving data publishing was to enhance traditional data mining techniques
for masking sensitive information through data modification. The major issues were how to modify the data
and how to recover the data mining result from the altered data. The reports were often tightly coupled
with the data mining algorithms under consideration. Privacy preserving data publishing focuses on
techniques for publishing data, not techniques for data mining. In case, it is expected that standard data
mining techniques are applied on the published data. Anonymization of the data is done by hiding the
identity of record owners, whereas privacy preserving data mining seeks to directly belie the sensitive data.
This survey carries out the various privacy preservation techniques and algorithms.
A review on privacy preservation in data miningijujournal
The main focus of privacy preserving data publishing was to enhance traditional data mining techniques for masking sensitive information through data modification. The major issues were how to modify the data and how to recover the data mining result from the altered data. The reports were often tightly coupled with the data mining algorithms under consideration. Privacy preserving data publishing focuses on techniques for publishing data, not techniques for data mining. In case, it is expected that standard data mining techniques are applied on the published data. Anonymization of the data is done by hiding the identity of record owners, whereas privacy preserving data mining seeks to directly belie the sensitive data. This survey carries out the various privacy preservation techniques and algorithms.
Additive gaussian noise based data perturbation in multi level trust privacy ...IJDKP
This document discusses a technique called additive Gaussian noise based data perturbation for privacy preserving data mining. The technique introduces multiple perturbed copies of data for different trust levels of data miners to prevent diversity attacks. Gaussian noise is added to the original data and correlated between copies so that combining copies does not provide additional information about the original data. The goal is to limit what information adversaries can learn from individual or combined copies to within what the data owner intends to share, while still allowing accurate data mining. Experiments on banking customer data show the approach controls the normalized estimation error from individual and combined copies.
An Investigation of Data Privacy and Utility Preservation Using KNN Classific...Kato Mivule
Kato Mivule and Claude Turner, An Investigation of Data Privacy and Utility Preservation Using KNN Classification as a Gauge, International Conference on Information and Knowledge Engineering (IKE 2013), July 22-25, Pages 203-204, Las Vegas, NV, USA
Information Upload and retrieval using SP Theory of IntelligenceINFOGAIN PUBLICATION
In today’s technology Cloud computing has become an important aspect and storing of data on cloud is of high importance as the need for virtual space to store massive amount of data has grown during the years. However time taken for uploading and downloading is limited by processing time and thus need arises to solve this issue to handle large data and their processing. Another common problem is de duplication. With the cloud services growing at a rapid rate it is also associated by increasing large volumes of data being stored on remote servers of cloud. But most of the remote stored files are duplicated because of uploading the same file by different users at different locations. A recent survey by EMC says about 75% of the digital data present on cloud are duplicate copies. To overcome these two problems in this paper we are using SP theory of intelligence using lossless compression of information, which makes the big data smaller and thus reduces the problems in storage and management of large amounts of data.
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data PrivacyKato Mivule
Genomic data provides clinical researchers with vast opportunities to study various patient ailments. Yet the same data contains revealing information, some of which a patient might want to remain concealed. The question then arises: how can an entity transact in full DNA data while concealing certain sensitive pieces of information in the genome sequence, and maintain DNA data utility? As a response to this question, we propose a codon frequency obfuscation heuristic, in which a redistribution of codon frequency values with highly expressed genes is done in the same amino acid group, generating an obfuscated DNA sequence. Our preliminary results show that it might be possible to publish an obfuscated DNA sequence with a desired level of similarity (utility) to the original DNA sequence. http://arxiv.org/abs/1405.5410
Kato Mivule - Utilizing Noise Addition for Data Privacy, an OverviewKato Mivule
Kato Mivule, "Utilizing Noise Addition for Data Privacy, an Overview", Proceedings of the International Conference on Information and Knowledge Engineering (IKE 2012), Pages 65-71, Las Vegas, NV, USA.
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...Kato Mivule
Kato Mivule, Claude Turner, "A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Using Machine Learning Classification as a Gauge", Procedia Computer Science, Volume 20, 2013, Pages 414-419, Baltimore MD, USA
MDAV2K: A VARIABLE-SIZE MICROAGGREGATION TECHNIQUE FOR PRIVACY PRESERVATIONcscpconf
Public and private organizations are collecting personal data regarding day to day life of
individuals and accumulating them in large databases. Data mining techniques may be applied
to such databases to extract useful hidden knowledge. Releasing the databases for data mining
purpose may lead to breach of individual privacy. Therefore the databases must be protected
through means of privacy preservation techniques before releasing them for data mining
purpose. Microaggregation is a privacy preservation technique used by statistical disclosure
control community as well as data mining community for microdata protection. The Maximum
distance to Average Vector (MDAV) is a very popular multivariate fixed-size microaggregation
technique studied by many researchers. The principal goal of such techniques is to preserve
privacy without much information loss. In this paper we propose a variable-size, improved MDAV technique having low information loss.
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...Kato Mivule
Kato Mivule, Claude Turner, Soo-Yeon Ji, "Towards A Differential Privacy and Utility Preserving Machine Learning Classifier", Procedia Computer Science (Complex Adaptive Systems), 2012, Pages 176-181, Washington DC, USA.
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...IJDKP
Huge volume of data from domain specific applications such as medical, financial, library, telephone,
shopping records and individual are regularly generated. Sharing of these data is proved to be beneficial
for data mining application. On one hand such data is an important asset to business decision making by
analyzing it. On the other hand data privacy concerns may prevent data owners from sharing information
for data analysis. In order to share data while preserving privacy, data owner must come up with a solution
which achieves the dual goal of privacy preservation as well as an accuracy of data mining task –
clustering and classification. An efficient and effective approach has been proposed that aims to protect
privacy of sensitive information and obtaining data clustering with minimum information loss
This document summarizes and compares different perturbation techniques for privacy-preserving data mining. It begins by describing value-based perturbation techniques like random noise addition and randomized responses, which aim to preserve statistical characteristics of data. It then covers data mining task-based techniques like condensation and random rotation perturbation that modify data to preserve properties important for specific mining tasks. Dimension reduction techniques like random projection that reduce dimensionality while maintaining privacy are also discussed. The document evaluates these techniques based on criteria like privacy loss, information loss, and ability to perform mining tasks on perturbed data. It concludes that perturbation is a popular privacy-preserving technique but achieving the right balance between privacy and utility remains a challenge.
On distributed fuzzy decision trees for big datanexgentechnology
GET IEEE BIG DATA, JAVA ,DOTNET,ANDROID ,NS2,MATLAB,EMBEDED AT LOW COST WITH BEST QUALITY PLEASE CONTACT BELOW NUMBER
FOR MORE INFORMATION PLEASE FIND THE BELOW DETAILS:
Nexgen Technology
No :66,4th cross,Venkata nagar,
Near SBI ATM,
Puducherry.
Email Id: praveen@nexgenproject.com
Mobile: 9791938249
Telephone: 0413-2211159
www.nexgenproject.com
Privacy Preservation and Restoration of Data Using Unrealized Data SetsIJERA Editor
In today’s world, there is an improved advance in hardware technology which increases the capability to store and record personal data about consumers and individuals. Data mining extracts knowledge to support a variety of areas as marketing, medical diagnosis, weather forecasting, national security etc successfully. Still there is a challenge to extract certain kinds of data without violating the data owners’ privacy. As data mining becomes more enveloping, such privacy concerns are increasing. This gives birth to a new category of data mining method called privacy preserving data mining algorithm (PPDM). The aim of this algorithm is to protect the easily affected information in data from the large amount of data set. The privacy preservation of data set can be expressed in the form of decision tree. This paper proposes a privacy preservation based on data set complement algorithms which store the information of the real dataset. So that the private data can be safe from the unauthorized party, if some portion of the data can be lost, then we can recreate the original data set from the unrealized dataset and the perturbed data set.
The document summarizes a novel approach for privacy preserving data mining on continuous and discrete data sets. The approach converts original sample data sets into a group of "unreal" data sets from which the original samples cannot be reconstructed. However, an accurate decision tree can still be built directly from the unreal data sets. This protects privacy while maintaining data mining utility. The approach determines information entropy and generates a decision tree using the unreal data sets and a perturbing set, without reconstructing the original samples.
Double Key Encryption Method (DKEM) Algorithms Using ANN for Data Storing and...IOSR Journals
This document proposes a double key encryption method using artificial neural networks for securely storing and retrieving data in cloud computing. The method uses a backpropagation neural network for key generation and to transform plaintext into an encrypted form for storage. Encryption occurs in two stages - first using a transformation key to produce encrypted input, then a second encryption key to generate ciphertext. For retrieval, the process is reversed - decrypting with keys then transforming back to plaintext. The goal is to enhance cloud data security even if an intruder obtains the ciphertext, as they could not decrypt without both keys.
Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...Kato Mivule
Literature Review – Talk, By Kato Mivule, COSC891 Fall 2013, Computer Science Department, Bowie State University
"Signal Processing and Machine Learning with Differential Privacy Algorithms and challenges for continuous data" Sarwate and Chaudhuri (2013)
With the surge in modern research focus towards Pervasive Computing, lot of techniques and challenges
needs to be addressed so as to effectively create smart spaces and achieve miniaturization. In the process of
scaling down to compact devices, the real things to ponder upon are the Information Retrieval challenges.
In this work, we discuss the aspects of multimedia which makes information access challenging. An
Example Pattern Recognition scenario is presented and the mathematical techniques that can be used to
model uncertainty are also presented for developing a system that can sense, compute and communicate in
a way that can make human life easy with smart objects assisting from around his surroundings.
Study and Analysis of K-Means Clustering Algorithm Using RapidminerIJERA Editor
Institution is a place where teacher explains and student just understands and learns the lesson. Every student has his own definition for toughness and easiness and there isn’t any absolute scale for measuring knowledge but examination score indicate the performance of student. In this case study, knowledge of data mining is combined with educational strategies to improve students’ performance. Generally, data mining (sometimes called data or knowledge discovery) is the process of analysing data from different perspectives and summarizing it into useful information. Data mining software is one of a number of analytical tools for data. It allows users to analyse data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational database. Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters).This project describes the use of clustering data mining technique to improve the efficiency of academic performance in the educational institutions .In this project, a live experiment was conducted on students .By conducting an exam on students of computer science major using MOODLE(LMS) and analysing that data generated using RapidMiner(Datamining Software) and later by performing clustering on the data. This method helps to identify the students who need special advising or counselling by the teacher to give high quality of education.
Privacy Preserving Clustering on Distorted dataIOSR Journals
- The document discusses privacy-preserving clustering on distorted data using singular value decomposition (SVD) and sparsified singular value decomposition (SSVD).
- It applies SVD and SSVD to distort a real-world dataset of 100 terrorists with 42 attributes, generating distorted datasets.
- K-means clustering is then performed on the original and distorted datasets for different numbers of clusters (k). The results show that SSVD more effectively groups the data objects into clusters compared to the original and SVD-distorted datasets, while preserving data privacy as measured by various metrics.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
Implementation of Data Privacy and Security in an Online Student Health Recor...Kato Mivule
Kato Mivule, Stephen Otunba, Tattwamasi Tripathy, Sharad and Sharma, "Implementation of Data Privacy and Security in an Online Student Health Records System", Proceedings at the ISCA 21th Int Conf on Software Engineering and Data Engineering (SEDE-2012), Pages 143-148, Los Angeles, CA, USA
Towards A More Secure Web Based Tele Radiology System: A Steganographic ApproachCSCJournals
While it is possible to make a patient's medical images available to a practicing radiologist online e.g. through open network systems inter connectivity and email attachments, these methods don't guarantee the security, confidentiality and tamper free reliability required in a medical information system infrastructure. The possibility of securely and covertly transmitting such medical images remotely for clinical interpretation and diagnosis through a secure steganographic technique was the focus of this study.
We propose a method that uses an Enhanced Least Significant Bit (ELSB) steganographic insertion method to embed a patient's Medical Image (MI) in the spatial domain of a cover digital image and his/her health records in the frequency domain of the same cover image as a watermark to ensure tamper detection and nonrepudiation. The ELSB method uses the Marsenne Twister (MT) Pseudo Random Number Generator (PRNG) to randomly embed and conceal the patient's data in the cover image. This technique significantly increases the imperceptibility of the hidden information to steganalysis thereby enhancing the security of the embedded patient's data.
In measuring the effectiveness of the proposed method, the study adopted the Design Science Research (DSR) methodology, a paradigm for problem solving in computing and Information Systems (IS) that involves design and implementation of artefacts and methods considered novel and the analytical testing of the performance of such artefacts in pursuit of understanding and enhancing an existing method, artefact or practice.
The fidelity measures of the stego images from the proposed method were compared with those from the traditional Least Significant Bit (LSB) method in order to establish the imperceptibility of the embedded information. The results demonstrated improvements of between 1 to 2.6 decibels (dB) in the Peak Signal to Noise Ratio (PSNR), and up to 0.4 MSE ratios for the proposed method.
This document discusses applying a neural network approach to decision making in a self-organizing computing network (SOCN). It proposes using concepts from fuzzy logic and neural networks to build a computing network that can handle mixed data types, like symbolic and numeric data. The network would have input, hidden, and output layers connected by transfer functions. The hidden cells would self-organize based on training data to learn relationships between input and output cells. This approach aims to allow the network to make decisions on data sets with diverse attribute types in a more effective way than other techniques.
This document discusses improved K-means clustering techniques. It begins with an introduction to data mining and clustering. K-means clustering groups data objects into k clusters by minimizing distances between objects and cluster centers. However, K-means has limitations such as dependency on initialization. The document proposes a new clustering algorithm that uses an iterative relocation technique and distance determination approach to improve upon K-means clustering. It compares the computational complexity of K-means and K-medoids clustering algorithms.
Privacy preserving back propagation neural network learning over arbitrarily ...IEEEFINALYEARPROJECTS
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.co¬m-Visit Our Website: www.finalyearprojects.org
Using Randomized Response Techniques for Privacy-Preserving Data Mining14894
This document proposes using randomized response techniques to conduct privacy-preserving data mining and build decision tree classifiers from disguised data. It presents a method called Multivariate Randomized Response (MRR) that extends randomized response to handle multiple attributes. Experiments show that while the data is disguised, decision trees built from it can still achieve high accuracy compared to trees built from original data, if the randomization parameter is chosen appropriately. The accuracy is affected by this randomization parameter.
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data PrivacyKato Mivule
Genomic data provides clinical researchers with vast opportunities to study various patient ailments. Yet the same data contains revealing information, some of which a patient might want to remain concealed. The question then arises: how can an entity transact in full DNA data while concealing certain sensitive pieces of information in the genome sequence, and maintain DNA data utility? As a response to this question, we propose a codon frequency obfuscation heuristic, in which a redistribution of codon frequency values with highly expressed genes is done in the same amino acid group, generating an obfuscated DNA sequence. Our preliminary results show that it might be possible to publish an obfuscated DNA sequence with a desired level of similarity (utility) to the original DNA sequence. http://arxiv.org/abs/1405.5410
Kato Mivule - Utilizing Noise Addition for Data Privacy, an OverviewKato Mivule
Kato Mivule, "Utilizing Noise Addition for Data Privacy, an Overview", Proceedings of the International Conference on Information and Knowledge Engineering (IKE 2012), Pages 65-71, Las Vegas, NV, USA.
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...Kato Mivule
Kato Mivule, Claude Turner, "A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Using Machine Learning Classification as a Gauge", Procedia Computer Science, Volume 20, 2013, Pages 414-419, Baltimore MD, USA
MDAV2K: A VARIABLE-SIZE MICROAGGREGATION TECHNIQUE FOR PRIVACY PRESERVATIONcscpconf
Public and private organizations are collecting personal data regarding day to day life of
individuals and accumulating them in large databases. Data mining techniques may be applied
to such databases to extract useful hidden knowledge. Releasing the databases for data mining
purpose may lead to breach of individual privacy. Therefore the databases must be protected
through means of privacy preservation techniques before releasing them for data mining
purpose. Microaggregation is a privacy preservation technique used by statistical disclosure
control community as well as data mining community for microdata protection. The Maximum
distance to Average Vector (MDAV) is a very popular multivariate fixed-size microaggregation
technique studied by many researchers. The principal goal of such techniques is to preserve
privacy without much information loss. In this paper we propose a variable-size, improved MDAV technique having low information loss.
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...Kato Mivule
Kato Mivule, Claude Turner, Soo-Yeon Ji, "Towards A Differential Privacy and Utility Preserving Machine Learning Classifier", Procedia Computer Science (Complex Adaptive Systems), 2012, Pages 176-181, Washington DC, USA.
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...IJDKP
Huge volume of data from domain specific applications such as medical, financial, library, telephone,
shopping records and individual are regularly generated. Sharing of these data is proved to be beneficial
for data mining application. On one hand such data is an important asset to business decision making by
analyzing it. On the other hand data privacy concerns may prevent data owners from sharing information
for data analysis. In order to share data while preserving privacy, data owner must come up with a solution
which achieves the dual goal of privacy preservation as well as an accuracy of data mining task –
clustering and classification. An efficient and effective approach has been proposed that aims to protect
privacy of sensitive information and obtaining data clustering with minimum information loss
This document summarizes and compares different perturbation techniques for privacy-preserving data mining. It begins by describing value-based perturbation techniques like random noise addition and randomized responses, which aim to preserve statistical characteristics of data. It then covers data mining task-based techniques like condensation and random rotation perturbation that modify data to preserve properties important for specific mining tasks. Dimension reduction techniques like random projection that reduce dimensionality while maintaining privacy are also discussed. The document evaluates these techniques based on criteria like privacy loss, information loss, and ability to perform mining tasks on perturbed data. It concludes that perturbation is a popular privacy-preserving technique but achieving the right balance between privacy and utility remains a challenge.
On distributed fuzzy decision trees for big datanexgentechnology
GET IEEE BIG DATA, JAVA ,DOTNET,ANDROID ,NS2,MATLAB,EMBEDED AT LOW COST WITH BEST QUALITY PLEASE CONTACT BELOW NUMBER
FOR MORE INFORMATION PLEASE FIND THE BELOW DETAILS:
Nexgen Technology
No :66,4th cross,Venkata nagar,
Near SBI ATM,
Puducherry.
Email Id: praveen@nexgenproject.com
Mobile: 9791938249
Telephone: 0413-2211159
www.nexgenproject.com
Privacy Preservation and Restoration of Data Using Unrealized Data SetsIJERA Editor
In today’s world, there is an improved advance in hardware technology which increases the capability to store and record personal data about consumers and individuals. Data mining extracts knowledge to support a variety of areas as marketing, medical diagnosis, weather forecasting, national security etc successfully. Still there is a challenge to extract certain kinds of data without violating the data owners’ privacy. As data mining becomes more enveloping, such privacy concerns are increasing. This gives birth to a new category of data mining method called privacy preserving data mining algorithm (PPDM). The aim of this algorithm is to protect the easily affected information in data from the large amount of data set. The privacy preservation of data set can be expressed in the form of decision tree. This paper proposes a privacy preservation based on data set complement algorithms which store the information of the real dataset. So that the private data can be safe from the unauthorized party, if some portion of the data can be lost, then we can recreate the original data set from the unrealized dataset and the perturbed data set.
The document summarizes a novel approach for privacy preserving data mining on continuous and discrete data sets. The approach converts original sample data sets into a group of "unreal" data sets from which the original samples cannot be reconstructed. However, an accurate decision tree can still be built directly from the unreal data sets. This protects privacy while maintaining data mining utility. The approach determines information entropy and generates a decision tree using the unreal data sets and a perturbing set, without reconstructing the original samples.
Double Key Encryption Method (DKEM) Algorithms Using ANN for Data Storing and...IOSR Journals
This document proposes a double key encryption method using artificial neural networks for securely storing and retrieving data in cloud computing. The method uses a backpropagation neural network for key generation and to transform plaintext into an encrypted form for storage. Encryption occurs in two stages - first using a transformation key to produce encrypted input, then a second encryption key to generate ciphertext. For retrieval, the process is reversed - decrypting with keys then transforming back to plaintext. The goal is to enhance cloud data security even if an intruder obtains the ciphertext, as they could not decrypt without both keys.
Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...Kato Mivule
Literature Review – Talk, By Kato Mivule, COSC891 Fall 2013, Computer Science Department, Bowie State University
"Signal Processing and Machine Learning with Differential Privacy Algorithms and challenges for continuous data" Sarwate and Chaudhuri (2013)
With the surge in modern research focus towards Pervasive Computing, lot of techniques and challenges
needs to be addressed so as to effectively create smart spaces and achieve miniaturization. In the process of
scaling down to compact devices, the real things to ponder upon are the Information Retrieval challenges.
In this work, we discuss the aspects of multimedia which makes information access challenging. An
Example Pattern Recognition scenario is presented and the mathematical techniques that can be used to
model uncertainty are also presented for developing a system that can sense, compute and communicate in
a way that can make human life easy with smart objects assisting from around his surroundings.
Study and Analysis of K-Means Clustering Algorithm Using RapidminerIJERA Editor
Institution is a place where teacher explains and student just understands and learns the lesson. Every student has his own definition for toughness and easiness and there isn’t any absolute scale for measuring knowledge but examination score indicate the performance of student. In this case study, knowledge of data mining is combined with educational strategies to improve students’ performance. Generally, data mining (sometimes called data or knowledge discovery) is the process of analysing data from different perspectives and summarizing it into useful information. Data mining software is one of a number of analytical tools for data. It allows users to analyse data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational database. Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters).This project describes the use of clustering data mining technique to improve the efficiency of academic performance in the educational institutions .In this project, a live experiment was conducted on students .By conducting an exam on students of computer science major using MOODLE(LMS) and analysing that data generated using RapidMiner(Datamining Software) and later by performing clustering on the data. This method helps to identify the students who need special advising or counselling by the teacher to give high quality of education.
Privacy Preserving Clustering on Distorted dataIOSR Journals
- The document discusses privacy-preserving clustering on distorted data using singular value decomposition (SVD) and sparsified singular value decomposition (SSVD).
- It applies SVD and SSVD to distort a real-world dataset of 100 terrorists with 42 attributes, generating distorted datasets.
- K-means clustering is then performed on the original and distorted datasets for different numbers of clusters (k). The results show that SSVD more effectively groups the data objects into clusters compared to the original and SVD-distorted datasets, while preserving data privacy as measured by various metrics.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
Implementation of Data Privacy and Security in an Online Student Health Recor...Kato Mivule
Kato Mivule, Stephen Otunba, Tattwamasi Tripathy, Sharad and Sharma, "Implementation of Data Privacy and Security in an Online Student Health Records System", Proceedings at the ISCA 21th Int Conf on Software Engineering and Data Engineering (SEDE-2012), Pages 143-148, Los Angeles, CA, USA
Towards A More Secure Web Based Tele Radiology System: A Steganographic ApproachCSCJournals
While it is possible to make a patient's medical images available to a practicing radiologist online e.g. through open network systems inter connectivity and email attachments, these methods don't guarantee the security, confidentiality and tamper free reliability required in a medical information system infrastructure. The possibility of securely and covertly transmitting such medical images remotely for clinical interpretation and diagnosis through a secure steganographic technique was the focus of this study.
We propose a method that uses an Enhanced Least Significant Bit (ELSB) steganographic insertion method to embed a patient's Medical Image (MI) in the spatial domain of a cover digital image and his/her health records in the frequency domain of the same cover image as a watermark to ensure tamper detection and nonrepudiation. The ELSB method uses the Marsenne Twister (MT) Pseudo Random Number Generator (PRNG) to randomly embed and conceal the patient's data in the cover image. This technique significantly increases the imperceptibility of the hidden information to steganalysis thereby enhancing the security of the embedded patient's data.
In measuring the effectiveness of the proposed method, the study adopted the Design Science Research (DSR) methodology, a paradigm for problem solving in computing and Information Systems (IS) that involves design and implementation of artefacts and methods considered novel and the analytical testing of the performance of such artefacts in pursuit of understanding and enhancing an existing method, artefact or practice.
The fidelity measures of the stego images from the proposed method were compared with those from the traditional Least Significant Bit (LSB) method in order to establish the imperceptibility of the embedded information. The results demonstrated improvements of between 1 to 2.6 decibels (dB) in the Peak Signal to Noise Ratio (PSNR), and up to 0.4 MSE ratios for the proposed method.
This document discusses applying a neural network approach to decision making in a self-organizing computing network (SOCN). It proposes using concepts from fuzzy logic and neural networks to build a computing network that can handle mixed data types, like symbolic and numeric data. The network would have input, hidden, and output layers connected by transfer functions. The hidden cells would self-organize based on training data to learn relationships between input and output cells. This approach aims to allow the network to make decisions on data sets with diverse attribute types in a more effective way than other techniques.
This document discusses improved K-means clustering techniques. It begins with an introduction to data mining and clustering. K-means clustering groups data objects into k clusters by minimizing distances between objects and cluster centers. However, K-means has limitations such as dependency on initialization. The document proposes a new clustering algorithm that uses an iterative relocation technique and distance determination approach to improve upon K-means clustering. It compares the computational complexity of K-means and K-medoids clustering algorithms.
Privacy preserving back propagation neural network learning over arbitrarily ...IEEEFINALYEARPROJECTS
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.co¬m-Visit Our Website: www.finalyearprojects.org
Using Randomized Response Techniques for Privacy-Preserving Data Mining14894
This document proposes using randomized response techniques to conduct privacy-preserving data mining and build decision tree classifiers from disguised data. It presents a method called Multivariate Randomized Response (MRR) that extends randomized response to handle multiple attributes. Experiments show that while the data is disguised, decision trees built from it can still achieve high accuracy compared to trees built from original data, if the randomization parameter is chosen appropriately. The accuracy is affected by this randomization parameter.
With the development of database, the data volume stored in database increases rapidly and in the large
amounts of data much important information is hidden. If the information can be extracted from the
database they will create a lot of profit for the organization. The question they are asking is how to extract
this value. The answer is data mining. There are many technologies available to data mining practitioners,
including Artificial Neural Networks, Genetics, Fuzzy logic and Decision Trees. Many practitioners are
wary of Neural Networks due to their black box nature, even though they have proven themselves in many
situations. This paper is an overview of artificial neural networks and questions their position as a
preferred tool by data mining practitioners.
Analytical Study On Artificial Neural NetworkKristen Carter
This document analyzes the use of artificial neural networks in cryptography. It discusses how neural networks can be used for secret key exchange and encryption. Specifically, it examines how synchronizing two neural networks through mutual learning can generate a secret key that is then used to encrypt and decrypt messages. The document also reviews various soft computing techniques like fuzzy logic that can be integrated with neural networks for cryptography applications. It concludes that artificial neural networks have potential for building more secure and effective cryptographic systems by overcoming some of the drawbacks of traditional approaches.
Back-Propagation Neural Network Learning with Preserved Privacy using Cloud C...IOSR Journals
The document discusses a method for multiple parties to jointly conduct backpropagation neural network learning on their private datasets using cloud computing while preserving privacy. Each party encrypts their dataset using AES encryption and uploads the ciphertexts to the cloud. The cloud then homomorphically encrypts the ciphertexts using BGN encryption to allow operations like addition and multiplication to be performed without revealing the plaintexts. The cloud runs the backpropagation learning algorithm over the encrypted data to train the neural network in a collaborative manner. This allows private datasets from different parties to be jointly learned from while ensuring the privacy of each party's original data is maintained.
This document summarizes a research paper on conducting back-propagation neural network learning while preserving data privacy using cloud computing. It discusses how multiple parties can jointly train a neural network on their combined datasets without disclosing their private data. The proposed solution encrypts each party's data using AES before uploading it to the cloud. The cloud then performs operations on the encrypted data using a homomorphic encryption algorithm during backpropagation training. This allows the neural network to learn from the combined datasets without the cloud learning the actual contents. The document outlines the system model, security model, data encryption and partitioning steps, and backpropagation algorithm used to enable collaborative and privacy-preserving learning across multiple parties' datasets using cloud resources.
This document discusses image encryption using a chaotic artificial neural network. It begins by introducing image encryption and its importance for securely transmitting valuable data over the internet. It then provides background on encryption techniques and discusses how image encryption works. The document outlines chaotic cryptography and why characteristics of chaos make it suitable for cryptography. It also discusses artificial neural networks and how they can be used for image encryption. In particular, it describes using a feedforward neural network with hidden layers to encrypt images.
This document proposes a new approach for preserving sensitive data privacy when clustering data. It involves adding noise to numeric attributes in the data using a fuzzy membership function, which distorts the data while maintaining the original clusters. This method is compared to other privacy preservation techniques like data swapping and noise addition. It is found to reduce processing time compared to other methods. The document outlines literature on privacy preservation techniques including data modification, cryptography, and data reconstruction methods. It then describes the proposed method of using a fuzzy membership function to add noise to sensitive attributes before clustering the data.
This document proposes a new approach for preserving sensitive data privacy when clustering data. It involves adding noise to numeric attributes in the data using a fuzzy membership function, which distorts the data while maintaining the original clusters. The fuzzy membership function uses a S-shaped curve to map original attribute values to modified values. Clustering is then performed on the distorted data. This approach aims to preserve privacy while reducing processing time compared to other privacy-preserving methods like cryptographic techniques, data swapping, and noise addition.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
This paper proposes a classification-based approach for suppressing data to prevent sensitive information from being inferred. It uses decision tree algorithms to classify data elements based on attributes and considers suppressing data elements to secure the data. The paper aims to enhance data classification and generalization. It shows how data can be secured using "generalization" while maintaining usefulness for data mining tasks. The proposed system focuses on data generalization concepts to hide detailed information for privacy while allowing standard data mining techniques to still discover patterns. It evaluates suppressing multiple confidential values and developing a technique independent of individual classification methods based on information theory.
In this era, there are need to secure data in distributed database system. For collaborative data
publishing some anonymization techniques are available such as generalization and bucketization. We consider
the attack can call as “insider attack” by colluding data providers who may use their own records to infer
others records. To protect our database from these types of attacks we used slicing technique for anonymization,
as above techniques are not suitable for high dimensional data. It cause loss of data and also they need clear
separation of quasi identifier and sensitive database. We consider this threat and make several contributions.
First, we introduce a notion of data privacy and used slicing technique which shows that anonymized data
satisfies privacy and security of data which classifies data vertically and horizontally. Second, we present
verification algorithms which prove the security against number of providers of data and insure high utility and
data privacy of anonymized data with efficiency. For experimental result we use the hospital patient datasets
and suggest that our slicing approach achieves better or comparable utility and efficiency than baseline
algorithms while satisfying data security. Our experiment successfully demonstrates the difference between
computation time of encryption algorithm which is used to secure data and our system.
This document provides an overview and summary of a student project report on simulating a feed forward artificial neural network in C++. The report includes an abstract, table of contents, list of figures, and 5 chapters that discuss the objectives of the project, provide background on artificial neural networks, describe the design and implementation of a 3-layer feed forward neural network using backpropagation, present the results, and provide references. The design section explains the backpropagation algorithm and provides pseudocode for calculating outputs at each layer. The implementation section provides pseudocode for training patterns and minimizing error.
https://utilitasmathematica.com/index.php/Index
Our journal has academic and professional communities fosters collaboration and knowledge sharing. When all voices are heard and respected, it strengthens the collective capabilities of the statistical community.
Efficient technique for privacy preserving publishing of set valued data on c...ElavarasaN GanesaN
The document proposes a technique for privacy-preserving publishing of set-valued data on cloud computing. It extends the existing Extended Quasi Identifier Partitioning (EQI-partitioning) technique by incorporating l-diversity and k-anonymity to reduce information loss. A multi-level accessibility model is also developed to provide security based on user access levels. Identity-based proxy re-encryption is used to encrypt the data according to sensitivity values and provide access to different user levels. The proposed method aims to reduce information loss while improving security when outsourcing sensitive set-valued data to the cloud.
In this work, the TREPAN algorithm is enhanced and extended for extracting decision trees from neural networks. We empirically evaluated the performance of the algorithm on a set of databases from real world events. This benchmark enhancement was achieved by adapting Single-test TREPAN and C4.5 decision tree induction algorithms to analyze the datasets. The models are then compared with X-TREPAN for
comprehensibility and classification accuracy. Furthermore, we validate the experimentations by applying statistical methods. Finally, the modified algorithm is extended to work with multi-class regression problems and the ability to comprehend generalized feed forward networks is achieved.
ANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINERIJCSEA Journal
Comparison study of algorithms is very much required before implementing them for the needs of any
organization. The comparisons of algorithms are depending on the various parameters such as data
frequency, types of data and relationship among the attributes in a given data set. There are number of
learning and classifications algorithms are used to analyse, learn patterns and categorize data are
available. But the problem is the one to find the best algorithm according to the problem and desired
output. The desired result has always been higher accuracy in predicting future values or events from the
given dataset. Algorithms taken for the comparisons study are Neural net, SVM, Naïve Bayes, BFT and
Decision stump. These top algorithms are most influential data mining algorithms in the research
community. These algorithms have been considered and mostly used in the field of knowledge discovery
and data mining.
Enabling Use of Dynamic Anonymization for Enhanced Security in CloudIOSR Journals
This document summarizes various techniques for anonymizing data to protect privacy and security when data is stored in the cloud. It discusses how anonymization removes identifying attributes from data to prevent individuals from being identified. The document reviews existing anonymization models like k-anonymity, l-diversity and t-closeness. It then describes different anonymization techniques like hashing, hiding, permutation, shifting, truncation, prefix-preserving and enumeration that were implemented to anonymize data fields. The goal is to anonymize data in a way that balances privacy, security, and the ability to still use the data for appropriate purposes.
A Review on Privacy Preservation in Data Miningijujournal
This document summarizes several techniques for privacy preservation in data mining and machine learning. It reviews algorithms for anonymizing social networks and preserving privacy in association rule mining. It also evaluates approaches for privacy-preserving decision trees, support vector machines, gradient descent methods, and data analysis. Overall, the document surveys and compares various algorithms and methods for protecting sensitive information while still enabling useful data mining and analysis.
A Review on Privacy Preservation in Data Miningijujournal
The main focus of privacy preserving data publishing was to enhance traditional data mining techniques
for masking sensitive information through data modification. The major issues were how to modify the data
and how to recover the data mining result from the altered data. The reports were often tightly coupled
with the data mining algorithms under consideration. Privacy preserving data publishing focuses on
techniques for publishing data, not techniques for data mining. In case, it is expected that standard data
mining techniques are applied on the published data. Anonymization of the data is done by hiding the
identity of record owners, whereas privacy preserving data mining seeks to directly belie the sensitive data.
This survey carries out the various privacy preservation techniques and algorithms.
Similar to Ppback propagation-bansal-zhong-2010 (20)
1. 1
Privacy Preserving Back-Propagation Neural
Network Learning Over Arbitrarily Partitioned Data
Ankur Bansal Tingting Chen Sheng Zhong
Computer Science and Engineering Department
State University of New york at Buffalo
Amherst, NY 14260, U. S. A
Email : {abansal5, tchen9, szhong}@cse.buffalo.edu
Abstract—Neural Networks have been an active research area
for decades. However, privacy bothers many when the training
dataset for the neural networks is distributed between two
parties, which is quite common nowadays. Existing cryptographic
approaches such as secure scalar product protocol provide a
secure way for neural network learning when the training dataset
is vertically partitioned. In this paper we present a privacy
preserving algorithm for the neural network learning when the
dataset is arbitrarily partitioned between the two parties. We
show that our algorithm is very secure and leaks no knowledge
(except the final weights learned by both parties) about other
party’s data. We demonstrate the efficiency of our algorithm by
experiments on real world data.
Index Terms—Privacy, Arbitrary Partitioned Data, Neural
Network
I. INTRODUCTION
Neural Networks have been an active research area for
decades. Trained neural networks can predict efficient outputs
which might be difficult to obtain in the real world. The
expansion of internet and world wide web has made it easier
to gather data from many sources [5], [10]. Training neural
network from the distributed data is common: for example,
making use of data from many hospitals to train the neural
network to predict a certain disease, collecting datasets of pur-chased
items from different grocery stores and training neural
network from those to predict a certain pattern of purchased
items. When training neural network from distributed data,
privacy is a major concern.
With the invention of new technologies, whether it is data
mining, in databases or in any networks, resolving privacy
problems has become very important. Because all sorts of data
is collected from many sources, the field of machine learning is
equally growing and so are the concerns regarding the privacy.
Data providers for machine learning are not willing to train
the neural network with their data at the expense of privacy
and even if they do participate in the training they might
either remove some information from their data or can provide
false information. Recent surveys [5] from web users conclude
that huge percentage of people are concerned about releasing
their private data in any form to the outside world. HIPAA,
Correspondence Author: Sheng Zhong, Department of Computer Science
and Engineering, State University of New York at Buffalo, Amherst, NY
14260, U. S. A. Phone: +1-716-645-3180 ext. 107. Fax: +1-716-645-3464.
Email: szhong@cse.buffalo.edu
Health Insurance Portability and Accountability Act rule [13]
prohibits to use individuals’ medical records and other per-sonal
health related information for personal or distribution
uses. Even the insurance companies have to take permission to
disclose anyone’s health related data [11]. So whenever there
is distributed data for machine learning, privacy measures are
must.
The datasets used for neural network training can be col-lectively
seen as a virtual database. In the distributed data
scenario this database can be partitioned in many ways. When
some rows of the database are with one party and the other
party holds the rest of the rows of the database, this is called
horizontal partitioned database. In such a case for neural
network training this does not pose a significant privacy threat
since each data holder can train the network in turns. When
some columns of the database are with one party and other
party holds the rest of the columns, this is called vertical
partitioning of the datasets for training. Chen and Zhong [6]
propose privacy preserving algorithm in the neural networks
when the training data is vertically partitioned. Their algorithm
is efficient and provides strong privacy guarantees. There is yet
another category for partitioned data (arbitrary partitioning)
which is studied in this paper. To the best of our knowledge
the problem of privacy preserving neural network learning over
arbitrarily partitioned data has not been solved.
In arbitrary partitioning of data between two parties, there
is no specific order of how the data is divided between two
parties. Combined data of two parties can be collectively seen
as a database. Specifically if we have database D, consisting
of n rows {DB1,DB2, · · · ,DBn}, and each row DBi (i goes
from 1 to n) contains m attributes, then in each row, A holds
a subset DBA
i of j attributes and B holds a subset DBB
i of k
attributes (where k=m-j) such that DBi = DBA
i and
i ∪ DBB
i ∩ DBB
i = ∅. In each row, the number of attributes in
DBA
two subsets can be equal (j=k) but does not have to be equal
that is,(j6= k). It might happen that j=m which means that A
completely holds that row, in rows where j=0, B completely
holds that row.
In this paper we propose a privacy preserving algorithm
for back-propagation neural network learning when the data
is arbitrarily partitioned. Our contributions can be summarized
as follows. (1) To the best of our knowledge we are the first to
propose privacy preserving algorithm for the neural networks
when the data is arbitrarily partitioned. (2) Our algorithm is
2. 2
quite efficient in terms of computational and communication
overheads. (3) In terms of privacy, our algorithm leaks no
knowledge about other’s party data except the final weights
learned by the network at the end of training.
The rest of the paper is organized as follows. Section II
describes the related work. In Section III, we introduce the
technical preliminaries including definitions, notations and
problem statement. In Section IV, we present the privacy
preserving algorithm for the back propagation neural network
learning when the data is arbitrarily partitioned. We show
computation and communication overhead analysis of our
algorithm in Section V. In Section VI, we verify the accuracy
and efficiency of our algorithm by experiments on real world
data. In the end we conclude our paper.
II. RELATED WORK
Privacy preserving neural network learning has been studied
in [6], [3], [26]. Barni et al. [3] proposed security algorithms
for three scenarios in neural networks. (a) When the data is
being held by one party and network parameters (weights)
are being held by the other (b) When in addition to the
weights, the other party wants to preserve activation function
also (c) When the other party wants to preserve the network
topology. Their work is limited to the extent that only one
party holds the data and the other holds the parameters of the
network. Distributed data scenario is not discussed in their
paper. Chen and Zhong [6] propose privacy preserving back-propagation
neural network learning algorithm when training
data is vertically partitioned. Their algorithm provides strong
privacy guaranty to the participants. The solution when the
training data is horizontally partitioned data is much easier
since all the data holders can train the neural network in
turns. In this paper we address the problem when the training
data for the neural network is arbitrarily partitioned (defined
below) between two parties. We will use secure scalar product
algorithm [27] and algorithm 3 of [6] in our algorithm so
that both parties just have random shares after each round of
training without each party knowing the other’s party data.
Privacy preserving algorithms have also been investigated
in data mining when the data to be mined is distributed
among different parties. For data mining, Agrawal et al.
[1] proposed randomization to preserve sensitive data at the
cost of accuracy. In order to preserve data accuracy, Lindell
and Pinkas [19] introduced cryptographic tools for privacy
preservation but the computation complexity increases with
huge data. Clifton et al. used commutative encryption property
to preserve privacy for the associative rule mining when
the data is either horizontally [23] or vertically partitioned
[17]. Jagannathan and Wright introduced privacy preserving
k-means clustering algorithm to cluster datasets when the data
is arbitrarily partitioned[14]. There are many more privacy
preserving data mining algorithms. However, none of these
algorithms can be used directly in the problem of privacy
preserving neural network learning when the training data is
arbitrarily partitioned between two parties. In this paper we
present privacy preserving back-propagation neural network
learning algorithms when the data is arbitrarily partitioned
between two parties so that no party is able to learn anything
about other’s party data except the final weights learned by
the network.
There is also a general-purpose technique in cryptography,
called secure multi-party computation that can be applied
to privacy preserving neural network learning problems. In
particular, the protocol proposed by Yao in [28] can privately
compute any probabilistic polynomial function. Secure multi-party
computation can theoretically solve all problems of
privacy-preserving computation. However, it is very costly
to be applied when it comes to practical problems [12].
Furthermore, in scenarios in which nerual networks is applied,
usually parties hold huge amounts of data. Therefore, this
general solution is especially infeasible to our problem.
III. TECHNICAL PRELIMINARIES
In this section we present the problem definition, notations
used, and an overview of the algorithm we propose to preserve
privacy in neural network training from arbitrarily partitioned
data between two parties.
A. Definitions
In this section we briefly describe the concept of arbitrarily
partitioned data and an overview of problem statement.
• Arbitrary Partitioned Data: We consider arbitrary par-titioning
of data between two parties in this paper. In
arbitrary partitioning of data between two parties, there
is no specific order of how the data is divided between
two parties. Combined data of two parties can be seen as a
database. Specifically if we have a database D, consisting
of n rows {DB1,DB2, · · · ,DBn}, and each row DBi
(i goes from 1 to n) contains m attributes, then in each
row, DBA
i is the subset of attributes held by A ( say
i ) and
j is the number of attributes in the subset DBA
i is the subset of attributes held by B (say k is the
DBB
number of attributes in the subset DBB
i ) such that DBi
= DBA
i ∪ DBB
i and DBA
i ∩DBB
i = ∅. In each row the
number of attributes in two subsets can be equal (j=k)
but does not have to be equal that is, (j6= k). It might
happen that j=m which means that A completely holds
that row or j=0 which means B completely holds that row.
In general, arbitrary partitioning is a more general case of
combinations of many horizontal and vertical partitions
of a database.
• Problem Definition: When the training data for the
neural networks is arbitrarily partitioned between two
parties, both parties want to train the network but at the
same time they do not want that the other party should
learn anything about its data except the final weights
learned by the network. NOTE:(We will not talk about
the rows whose all attributes are completely owned by
one party. This is trivial, since the party who holds this
row can independently train the network with its data
without revealing others anything about the data). So
we propose a privacy preserving back-propagation neural
network learning algorithm for the arbitrarily partitioned
data between two parties.
3. 3
B. Notations
We consider a 3-layer (a-b-c configuration) neural network
in the paper but our work can easily be extended to any N-layer
neural network.
• The input vector is denoted as {x1, x2, · · · , xn} where
any xi (i goes from 1 to n) is an input to the input node
of the neural network. In the paper we consider that two
parties (A and B) hold arbitrary partitioned data. This can
be extended to n-party partitioning but we leave this for
our future work. As discussed, the two parties share the
arbitrarily partitioned data such that for every object1, if
an object vector of n-dimensions (n attributes) is denoted
as x1, x2, · · · , xn Party A holds x1A, x2A, · · · , xnA and
Party B x1B , x2B , · · · , xnB such that for every object of
a virtual database
x1A + x1B = x1
x2A + x2B = x2
and so on ..
We require that for every attribute in {x1, x2, · · · , xn}
for every object, either xiA or xiB (i goes from 1 to n)
is 0. This means that, that attribute is being completely
held by the other party.
• We assume the values of hidden nodes to be
h1, h2, · · · , hnand the values of output nodes to be
o1, o2, · · · , on.
• Network Parameters: wh
jk denotes the weight connect-ing
the input layer node k and the hidden layer node j.
wo
ij denotes the weight connecting j and the output layer
node i, where 1≤k≤a; 1≤j≤b; 1≤i≤c. Here a denotes the
number of input nodes, b the hidden nodes and c denotes
the output nodes.
C. Algorithm Overview
It is highly important that not only the data but the in-termediate
weights also should not be revealed to the other
party because intermediate weights contain partial knowledge
about the data. We propose an algorithm in which both parties
modify the weights and hold random shares of the weights
during the training. Both the parties use the secure 2-party
computation [27] and algorithm 3 of [6] to calculate the
random shares of the weights between the training rounds.
Specifically, if x1A, x2A, · · · , xnA is an object held by A
where xiA (i varies from 1 to n) is an attribute in the row held
by A and x1B , x2B , · · · , xnB is an object held by B where
xiB (i varies from 1 to n) is an attribute in the row held
by B, the algorithm starts modifying weights till they reach
a target value t(x), where x = Pn
xiA + xiB and t(x) is
i=1 any function. Both the parties calculate the activation function
using [6] and they use secure 2-party computation algorithm
[27] and algorithm 3 of [6] to calculate random shares of the
weights in the proposed algorithm.
1object corresponds to a row in database in this paper
D. Security Model
In this paper, we assume semi-honest model which is a
standard security model in many privacy preserving papers
[19], [30]. Semi-honest model requires that all parties follow
the protocol but any party might try to learn some information
from the intermediate results. So our aim is that no knowledge
about each party’s data (except the final weights learned by
the network) is leaked in this model.
E. ElGamal scheme and Homomorphic Encryption
ElGamal Encryption scheme [8] and homomorphic property
of the scheme is used in our algorithm. Homomorpic property
is a property of certain encryption algorithms where specific
algebraic operations (multiplication) can be performed on
plaintext by performing the operations on encryption messages
without actually decrypting them. For example say we have
two messages m1 and m2, the encryption of message is
denoted by E(m1) and E(m2) then operation m1m2 can be
performed using E(m1) and E(m2) only without actually de-crypting
the two messages. Specifically, for ElGamal scheme,
we have
E(m1 ·m2) = E(m1) · E(m2). (1)
This property of encryption is being used in secure scalar
product algorithm [27].
IV. PRIVACY PRESERVING NEURAL NETWORK LEARNING
In this section we present the privacy preserving back-propagation
neural network learning algorithm over arbitrary
partitioned data between two parties.
NOTE: In this algorithm after each round of training both
the parties just hold the random shares of weights and not
the exact weights, this guarantees more security and privacy
against the intrusion by the other party. It is only at the end
of the training that both the parties know the actual weights
in the neural networks.
The error function which is used to calculate whether the
output is desired or not is given by:
e =
1
2Xi
(ti − oi)2
where i varies from 1 to n (number of outputs) .
If the value of this error function does not satisfy the output
requirements we have some more rounds of training , repeating
the algorithm again. This error is propagated backwards and
require change in weights according to the equations:
@e
@wo
ij
= −(ti − oi)hj (2)
@e
@wh
jk
= −hj(1 − hj)xk
N
X i=1
[(ti − oi)wo
ij ] (3)
The owner of the network assigns random weights to the
neural networks in the beginning of training. We will just
explain for one object, for rest of the objects it is same and self
explanatory. Let us assume party A holds x1A, x2A, · · · , xnA
4. 4
where any xiA is an attribute in the row held by A and party
B holds x1B , x2B , · · · , xnB where any xiB is an attribute in
the row held by B. For each input node i (i varies from 1 to
n) of the neural network, party A holds xiA and party B holds
xiB such that
xiA + xiB = xi
in which either xiA or xiB is 0 because only one of the two
parties can contain an input corresponding to that input node
of the input layer in the neural network.
The target value t(x) is known to both the parties. The aim
of the algorithm is to train the network, so as to modify the
weights wh
jk and wo
ij (wh
jk denotes the weight connecting
the input layer node k and the hidden layer node j and wo
ij
denotes the weight connecting j and the output layer node i)
so that, given the above input distributed dataset between A
and B, the output corresponds to nearly the target value.
During training, for each training sample, party A and party
B randomly share weights wh
jk and wo
ij after each training
jk = whA
round where wh
jk + whB
jk ( whA
jk is the share of party
A and whB
ij = woA
jk is the share of party B) and wo
ij + woB
ij
ij is the share of party A and woB
(woA
ij is the share of party B).
At the end of each round of training, each party holds only a
random share of each weight. Algorithm 1 describes the feed
forward stage.
Algorithm 1 Privacy preserving back-propagation learning
algorithm – feed forward stage
→ Party A holds (x1A, · · · , xnA), whA
jk and woA
ij .
→ Party B holds (x1B , · · · , xnB ), whB
jk and woB
ij . weight,
jk = whA
wh
jk + whB
jk , wo
ij = woA
ij + woB
ij .
For each hidden layer node hj ,
1) Using Algorithm 3, party A and B respec-tively
obtain random shares 'A and 'B for
Pa
k=1 (whA
jk + whB
jk )(xkA + xkB ).
2) Party A and B jointly compute the sigmoid function
for each hidden layer node hj , obtaining the random
shares hjA and hjB respectively s.t. hjA + hjB =
f('A + 'B) using [6].
For each output layer node oi,
1) Using Algorithm 3, party A and B respec-tively
obtain random shares oiA and oiB for
Pb
j=1 (woA
ij + woB
ij )(hjA + hjB).
In Algorithm 1 party A and party B compute their random
shares 'A and 'B from Pa
(whA
+ whB
)(xkA + xkB )
k=1 jk jk using secure scalar product algorithm [27] and algorithm 3
of [6]. With the help of [6], they calculate the approximation
of sigmoid function for each hidden layer node hj , obtaining
hjA and hjB as their random shares (where hjA is the share
held by A and hjB is the share held by B). Then with the help
of secure scalar product algorithm again party A and party B
calculate Pb
(woA
+ woB
)(ij ij hjA + hjB ) and obtain oiA and
j=1 oiB as their random shares where i depends on the number of
output nodes in the neural network. After obtaining the random
shares oiA and oiB , they follow Algorithm 2 which is the back-error
propagation stage. NOTE:We do not calculate the error
function after every round where both the parties might have
to exchange the random shares oiA and oiB to calculate the
error function. Rather we could fix a certain number of rounds
after which they can exchange the output shares to calculate
the error function.
Algorithm 2 Privacy preserving back-propagation learning
algorithm – back-propagation stage
→ Party A holds (x1A, · · · , xnA), ti, hjA, oiA, whA
jk and
woA
ij ;
→ party B holds (x1B , · · · , xnB ), ti, hjB, oiB , whB
jk and
woB
ij .
For each output layer weight wo
ij ,
1) Using Algorithm 3, Party A and B respectively obtain
random shares Awo
ij and Bwo
ij for (oiA + oiB −
ti)(hjA + hjB ).
For each hidden layer weight wh
jk,
1) Using Algorithm 3, party A and B respectively obtain
random shares μA and μB for Pc
i=1[oiA + oiB −
ti)(woA
ij + woB
ij )].
2) Party A and B respectively obtain random shares A
and B, such that A+B = (xkA +xkB)(μA+μB),
using Algorithm 3.
3) Party A and B securely compute (hjA+hjB )(1−hjA−
hjB ) by applying Algorithm 3, respectively obtaining
random shares #A and #B.
4) Using Algorithm 3, party A and B respectively ob-tain
random shares Awh
jk and Bwh
jk, for (#A +
#B)(A + B).
A computes woA
ij ← woA
ij − (Awo
ij ); whA
jk ← whA
jk −
jk).
(Awh
ij ← woB
B computes woB
ij ); whB
ij − (Bwo
jk ← whB
jk −
jk).
(Bwh
Algorithm 2 is the back-error propagation stage. This stage
helps to modify the weights so as to achieve correct weights
in the neural network. Both A and B modify their weights
according to equation 1 and 2. After some rounds of training
both A and B share their outputs to calculate the error function
e = 1
2 Pi(ti − oi)2, if the error is more, then the two parties
A and B have more rounds of training to achieve the target
function. Error propagation means that we are trying to modify
the values of weights so as to achieve the correct values of
weights. In this algorithm, for each output layer weight wo
ij ,
both parties obtain the random shares of the changes in weights
Awo
ij and Bwo
ij (where Awo
ij is the share held by A
and Bwo
ij is the share held by B) from equation 1 using
the secure scalar product protocol [27] and algorithm 3 of [6]
where ti is the target value of the ith output node of the neural
network.
For hidden layer weights wh
jk, we break the
equation 2 above into three parts. First of all the
two parties calculate random shares μA and μB from
Pc
[oiA + oiB − ti)(woA
+ woB
i=1ij ij )] using [27] and algorithm
3 of [6] (where μA is the share held by A and μB is
the share held by B). With the help of these shares they
5. 5
calculate random shares A and B using secure scalar
product algorithm [27] and algorithm 3 of [6], such that
A + B = (xkA + xkB )(μA + μB) (where A is the share
held by A and B is the share held by B). Then the two
parties calculate (hjA + hjB )(1 − hjA − hjB) to obtain the
random shares #A and #B (where #A is the share held by
A and #B is the share held by B) . Finally they calculate
(#A + #B)(A + B) obtaining Awh
jk and Bwh
jk as the
random shares.
After obtaining random shares
• A computes woA
ij ← woA
ij ) and
ij − (Awo
whA
jk ← whA
jk − (Awh
jk).
• B computes woB
ij ← woB
ij ) and
ij − (Bwo
whB
jk ← whB
jk − (Bwh
jk).
where is the network learning rate.
Algorithm 3 Securely computing (RA + RB)(SA + SB)
→ Party A holds RA and SA → Party B holds RB and SB.
1) Using secure scalar product algorithm [27] and al-gorithm
3 of [6], party A and B respectively obtain
random shares A and B for RASB, and random
shares
A,
B for RBSA.
2) Party A computes RASA+A+
A. Party B computes
RBSB −B −
B, such that (RA+RB)(SA +SB) =
RASA + A +
A + RBSB − B −
B.
Algorithm 3 describes how party A and party B calculate
any equation of the form (RA+RB)(SA+SB). They both use
secure scalar party algorithm [27] and algorithm 3 of [6] to
calculate RASB and RBSA. The secure scalar party algorithm
is used to calculate the product of two vectors such that at the
end of the calculation each party holds a random share of the
result so that no party is able to predict the other party’s vector.
V. COMPUTATION COMPLEXITY AND COMMUNICATION
OVERHEAD ANALYSIS
In this section we present the computation and communi-cation
overhead analysis of our privacy preserving algorithm.
1) Computation Complexity Analysis:
• Securely computing the scalar product of two vec-tors
– Let us assume that t1be the time taken by
A to generate public-private key pair and to send
public key to B. A does n encryptions where n
is the number of dimensions in the vector. B does
n+1 encryptions. Then B performs (2n+1) multipli-cations.
So the total time taken by the algorithm 4
is T1 = t1 + (2n + 1)E + (2n + 1)M where E is
the time taken to encrypt one message and M is the
time taken for 1 multiplication.
• Securely computing equation of the form (RA +
RB)(SA + SB) – Since A and B can run the
algorithm in parallel, so they obtain random shares
for RASB and RBSA in T1time. Party A does 1
multiplication and 2 additions and B does 1 mul-tiplication
and 2 subtractions. We assume the time
taken for addition and subtraction to be negligible
in comparison to multiplication and neglect them .
So the total time taken to compute equation of the
form RASB and RBSA is T2 = T1 + 2M.
• Privacy Preserving Back Propagation learning Al-gorithm
-Feed Forward Stage – Step 1 of the
algorithm 1 takes aT2 + bz time where z = (2p +
1)C +2D [6] where p is the system approximation
parameter, C is the cost of encryption and D is the
cost of partial decryption. Step 2 of the algorithm
takes bT2 time where N is the number of hidden and
output layers whose number is same in our case.
So the total time taken in feed forward stage is
aT2 + bz + bT2.
• Privacy Preserving Back propagation learning Al-gorithm
-back propagation stage – The equation
in step 1 of the Algorithm can be rewritten as
(oiA + oiB − ti)(hjA + hjB) = (oiA + oiB )(hjA +
hjB) − tihjA − tihjB So Step 1 of the algorithm
for back propagation stage takes c(T2+2M) where
M again is the time taken for 1 multiplication. Step
2.1 of the algorithm takes bc(T2 + 2M) time. Step
2.2 consumes bT2 time. Step 2.3 can be broken and
it takes 2 Multiplications an T1 time. So step 2.3
takes b(2M + T1) time. Step 4 takes bT2 time.
So total time taken in back propagation stage is
c(T2 + 2M) + bc(T2 + 2M) + 3bT2.
2) Communication Overhead Analysis:
• Communication overhead– We know, to calculate
securely the product of two integers it takes 2n+2
messages between Party A and Party B [6]. For
each message being s bits long, each communication
takes (2n+2)s bits. In the feed forward stage, for
each hidden layer node it needs b[a(t1) + t2] bits
and for each output layer node it takes c(bT1) bits
where T1 = 2s(2n+2) and T2 = s(2n+2). In the
back propagation stage of the algorithm, for the first
part of the algorithm it needs c(bT1) bits and for the
second part of the algorithm it needs b[c(bT1)+3T1]
bits.
VI. EVALUATION
In this section we perform our experiments to compare our
results of privacy preserving version of the algorithm with
non-privacy version of the algorithm to calculate the accuracy
losses (defined below). The experiments are carried out on the
data from UCI dataset repository [2].
A. Set Up
We have used C++ to implement our algorithm with g++
version 2.8. The experiments are carried out on a Linux
operating system (Ubuntu) with 1.9 GHz Intel processors and
2 GB of memory.
The experiments are performed on the real world data
from UCI dataset repository [2]. Table 1 shows the training
parameters, number of epochs (Number of training rounds),
architecture (Number of input nodes, hidden nodes, output
6. 6
nodes) used in the neural network model. The weights are
initialized randomly in the range [-0.1,0.1]. We have trained
the neural network using Iris, Dermatalogy, Sonar and Landsat
datasets. The attributes from each row of the datasets are
randomly divided between two parties (A and B) so that the
datasets can be modeled as arbitrarily partitioned between A
and B. The network learning rate for each dataset is assumed
as 0.2. The number of input nodes for the neural network
depends on each dataset and the hidden nodes are chosen such
that there are atleast 3 hidden nodes for each output.
The test samples (for each dataset) for the experiments are
taken randomly from the datasets only. Specifically, 20 test
samples are taken randomly each for Iris and Sonar and 30
each for Dermatalogy and Landsat. The number of epochs
are kept small for large datasets like Landsat and large for
other datasets. After training (completion of epochs for the
respective dataset) each test sample is run against the network
to observe whether it is misclassified (belongs to different
class) or it belongs to the same class.
B. Experimental Results
The main objective of our experiment is to measure the
accuracy loss of our algorithm as a cost of protecting pri-vacy.
Accuracy loss is a loss which occurs while applying
cryptographic schemes [8] on the non-privacy version of the
algorithm to protect each party’s data and random shares of
the intermediate computations. As our algorithm uses two
approximations (describe below), so when we perform our
experiments on the non-privacy ( When both the parties are
not worried revealing their data to the outside world) versus
privacy version (When both the parties are worried revealing
their data to the outside world) of the algorithm (which uses
approximations) accuracy loss takes place. The accuracy loss
for each dataset is calculated using the equation
AccuracyLoss = T1 − T2
where T1 is the Test error rate for Privacy version of the
algorithm and T2 is the Test error rate for Non-Privacy version
of the algorithm. Test Error rates for privacy as well as
non-privacy version of the algorithm are calculated using the
equation given by
T estErrorRate =
No.ofT estSamplesMisclassified
T otalNo.ofT rainingSamples
Cryptographic operations are required whenever there are
privacy issues, so accuracy loss is inevitable.
The accuracy loss of a privacy-preserving learning algorithm
like ours comes from two approximations. One is the approxi-mation
of sigmoid function, and the other is the approximation
of real numbers. To calculate the sigmoid function we use [6],
but [6] uses piecewise linear approximation of the sigmoid
Table I
DATASETS AND PARAMETERS
Dataset Sample Class Architecture Epochs Learning Rate
Iris 150 3 4 − 5 − 3 80 0.2
Dermatalogy 366 6 34 − 3 − 6 80 0.2
Sonar 208 2 60 − 6 − 2 125 0.2
Landsat 6435 6 36 − 3 − 6 8 0.2
Table II
TEST ERROR RATES COMPARISON
Non-privacy-preserving Privacy-preserving
Dataset Version Algorithm
Iris 20.00% 25.00%
Dermatalogy 36.66% 43.33%
Sonar 35.00% 40.00%
Landsat 23.33% 26.66%
function given by
y(x) =
1 x 8
0.015625x+ 0.875 4 x ≤ 8
0.03125x+ 0.8125 2 x ≤ 4
0.125x + 0.625 1 x ≤ 2
0.25x + 0.5 −1 x ≤ 1
0.125x + 0.375 −2 x ≤ −1
0.03125x+ 0.1875 −4 x ≤ −2
0.015625x+ 0.125 −8 x ≤ −4
0 x ≤ −8
(4)
We also map real numbers to the finite fields when applying
cryptographic algorithms for ElGamal scheme [8] because
cryptographic operations are on discrete finite fields. These are
the two approximations introduced in our privacy preserving
version of the algorithm.
Table 2 shows the results carried out on non-privacy versus
privacy version of the algorithm. Accuracy loss is unavoidable
since cryptographic operations are on discrete finite fields.
Because we have fixed the number of epochs in the beginning
of training, the minimum testing error might not be achieved.
But as can be seen, the accuracy loss varies between 3.33% for
Landsat to 6.67% for Dermatalogy. Since the accuracy loss is
within limits, our algorithm is quite effective in learning these
real world datasets.
VII. CONCLUSION
In this paper, we present a privacy preservation back prop-agation
neural network training algorithm when the training
data is arbitrarily partitioned between two parties. We assume
a semi-honest model and our algorithm is quite secured as
the intermediate results are randomly shared between the two
parties. The experiments we perform on the real world data
show that the amount of accuracy losses are within limits.
REFERENCES
[1] Agrawal, D., Srikant, R. (2000). Privacy preserving data mining, In
Proc. ACM SIGMOD, 439-450.
[2] Blake, C.L., Merz, C.J. (1998). UCI Repository of machine learning
databases, http://www.ics.uci.edu/ mlearn/MLRepository.html, Irvine, CA:
University of California, Department of Information and Computer Sci-ence.
7. 7
[3] Barni, M., Orlandi, C., Piva, A. (2006). A Privacy-Preserving Pro-tocol
for Neural-Network-Based Computation, in Proceeding of the 8th
workshop on Multimedia and security. 146-151.
[4] Chang, Y. C., Lu. C. J. (2001). Oblivious polynomial evaluation and
oblivious neural learning, In Proceedings of Asiacrypt, 369-384.
[5] Cranor, L. F., Reagle. J., Ackerman. M. S. (1999). Beyond
concern: Understanding net Users attitudes about online privacy.
Technical report TR 99.4.3, ATT Labs-Research, Available from
http://www.research.att.com/library/trs/TRs/99/99.4/99.4.3/report.htm.
[6] Chen, T., Zhong, S., (2009). Privacy Preserving Back-Propagation
Neural Network Learning, IEEE Transactions on Neural Networks, 20(10)
1554 - 1564.
[7] Du, W., Han, Y. S. Chen, S., (2004). Privacy-preserving multivariate
Statistical Analysis :Linear Regression and Classification. Proceedings of
the SIAM International Conference on Data Mining.
[8] ElGamal, T., (1985). A Public-Key Cryptosystem and a Signature Scheme
Based on Discrete Logarithms, IEEE Trans.Information Theory, 31(4).
469-472.
[9] Editorial. (2000). Whose scans are they, anyway?, Nature, 406, 443.
[10] Lorrie faith cranor, editor. (1999). Special Issue on Internet Pri-vate.
Comm.ACM. 42(2).
[11] (2001). Standard for privacy of individually identifiable health informa-tion.
Federal Register, 66(40).
[12] Goldreich, O.,Micali, S., Wigderson, A. (1987). How to play ANY
mental game, In Proceedings of Annual ACM Conference on Theory of
Computing, 218-229.
[13] HIPPA, National Standards to Protect the Privacy of Personal Health
Information, http://www.hhs.gov/ocr/hipaa/finalreg.html.
[14] Jagannathan, G. Wright, R. N. (2005). Privacy-preserving dis-tributed
k-means clustering over arbitrarily partitioned data. In Proc. ACM
SIGKDD, 2005, 593-599.
[15] Kantarcioglu, M., Clifton, C. (2002). Privacy-preserving distributed
mining of association rules on horizontally partitioned data. In The ACM
SIGMOD Workshop on Research Issues on Data Mining and Knowledge
Discovery (DMKD’02).
[16] Kantarcioglu, M., Vaidya, J. (2003). Privacy preserving naive Bayes
classifier for Horizontally partitioned data. In IEEE Workshop on Privacy
Preserving Data Mining.
[17] Kantarcioglu, M,, Clifton, C. (2004). Privacy-preserving distributed
mining of association rules on horizontally partitioned data. IEEE Trans.
Knowledge Data Eng., 16(4).
[18] Lippmann, R. P. (1987). An introduction to computing with neural
networks, IEEE Acoustics Speech and Signal Processing Magazine, 4, 4-
22.
[19] Lindell, Y., Pinkas, B. (2000). Privacy preserving data mining, in
Proceedings of the 20th Annual International Cryptology Conference on
Advances in Cryptology, 1880. 36-4.
[20] Lindell, Y., Pinkas, B. (2002). Privacy preserving data mining, Journal
of Cryptology, 15(3), 177-206.
[21] Rumelhart, D. E., Widrow, B., Lehr, M. A. (1994). The basic ideas
in neural networks, Communications of the ACM, 37, 87-92.
[22] Vaidya, J. C. Clifton. (2002). Privacy Preserving Association Rule
Mining in Vertically Partitioned Data, in Proc. of SIGKDD’02, 639-644.
[23] Vaidya, J. C. Clifton. (2002). Privacy preserving association rule
mining in vertically partitioned data, in The Eighth ACM SIGKDD Inter-national
Conference on Knowledge Discovery and Data Mining, 639-644.
[24] Vaidya, J. C. Clifton. (2004). Privacy preserving naive Bayes classifier
for vertically partitioned data, In Proc. SIAM International Conference on
Data Mining.
[25] Westin, A. F. (1999). Freebies and privacy: What netusers think. Tech-nical
Report, Opinion Research Corporation, July 1999. Available from
http://www.privacyexchange.org/iss/surveys/sr99014.html.
[26] Wan, L., Ng,W. K., Han, S., Lee, V. C. S. (2007). Privacy-preservation
for gradient descent methods, in Proc. of ACM SIGKDD, 775-783.
[27] Wright, R., Yang, Z. (2004). Privacy Preserving Bayesian Network
Structure Computation on Distributed Heterogeneous Data, In Proc. of the
Tenth ACM SIGKDD International Conference on Knowledge Discovery
and Data Mining, 713-718.
[28] Yao, A. C. (1982). Protocols for secure computations. In Proceedings of
the 23rd Annual Symposium on Foundations of Computer Science, 160-64.
[29] Yao, A. C. (1986). How to Generate and Exchange Secrtes, in Proc. of
the 27th IEEE Symposium on Foundations of Computer Science, 162-167.
[30] Yang, Z., Zhong, S., Wright. R. (2005). Privacy-preserving classi-fication
of customer data without loss of accuracy. In Proc. 5th SIAM
International Conference on Data Mining (SDM).
Ankur Bansal received his B.Tech degree in Electronics and Communication
Engineering from Ambala College of Engineering and Applied Research,
India, in 2007. He is currently a Masters candidate at the department of
computer science and engineering, The State University of New York at
Buffalo, U. S. A. His research interests include data privacy issues and
economic incentives in wireless networks.
Tingting Chen received her B.S. and M.S. degrees in computer science
from the department of computer science and technology, Harbin Institute of
Technology, China, in 2004 and 2006 respectively. She is currently a Ph.D.
candidate at the department of computer science and engineering, the State
University of New York at Buffalo, U. S. A. Her research interests include
data privacy and economic incentives in wireless networks.
Sheng Zhong is an assistant professor at the computer science and engineering
department of the State University of New York at Buffalo. He received his
BS (1996), ME (1999) from Nanjing University, and PhD (2004) from Yale
University, all in computer science. His research interests include privacy
and incentives in data mining and databases, economic incentives in wireless
networks.