This document summarizes various techniques for anonymizing data to protect privacy and security when data is stored in the cloud. It discusses how anonymization removes identifying attributes from data to prevent individuals from being identified. The document reviews existing anonymization models like k-anonymity, l-diversity and t-closeness. It then describes different anonymization techniques like hashing, hiding, permutation, shifting, truncation, prefix-preserving and enumeration that were implemented to anonymize data fields. The goal is to anonymize data in a way that balances privacy, security, and the ability to still use the data for appropriate purposes.
Performance Analysis of Hybrid Approach for Privacy Preserving in Data Miningidescitation
Now-a day’s data sharing between two organizations
is common in many application areas like business planning
or marketing. When data are to be shared between parties,
there could be some sensitive data which should not be
disclosed to the other parties. Also medical records are more
sensitive so, privacy protection is taken more seriously. As
required by the Health Insurance Portability and
Accountability Act (HIPAA), it is necessary to protect the
privacy of patients and ensure the security of the medical
data. To address this problem, released datasets must be
modified unavoidably. We propose a method called Hybrid
approach for privacy preserving and implemented it. First we
randomized the original data. Then we have applied
generalization on randomized or modified data. This
technique protect private data with better accuracy, also it can
reconstruct original data and provide data with no information
loss, makes usability of data.
VOLUME-7 ISSUE-8, AUGUST 2019 , International Journal of Research in Advent Technology (IJRAT) , ISSN: 2321-9637 (Online) Published By: MG Aricent Pvt Ltd
Using Randomized Response Techniques for Privacy-Preserving Data Mining14894
Privacy is an important issue in data mining and knowledge
discovery. In this paper, we propose to use the randomized
response techniques to conduct the data mining computation.
Specially, we present a method to build decision tree
classifiers from the disguised data. We conduct experiments
to compare the accuracy ofou r decision tree with the one
built from the original undisguised data. Our results show
that although the data are disguised, our method can still
achieve fairly high accuracy. We also show how the parameter
used in the randomized response techniques affects the
accuracy ofth e results
Keywords
Privacy, security, decision tree, data mining
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
In this era, there are need to secure data in distributed database system. For collaborative data
publishing some anonymization techniques are available such as generalization and bucketization. We consider
the attack can call as “insider attack” by colluding data providers who may use their own records to infer
others records. To protect our database from these types of attacks we used slicing technique for anonymization,
as above techniques are not suitable for high dimensional data. It cause loss of data and also they need clear
separation of quasi identifier and sensitive database. We consider this threat and make several contributions.
First, we introduce a notion of data privacy and used slicing technique which shows that anonymized data
satisfies privacy and security of data which classifies data vertically and horizontally. Second, we present
verification algorithms which prove the security against number of providers of data and insure high utility and
data privacy of anonymized data with efficiency. For experimental result we use the hospital patient datasets
and suggest that our slicing approach achieves better or comparable utility and efficiency than baseline
algorithms while satisfying data security. Our experiment successfully demonstrates the difference between
computation time of encryption algorithm which is used to secure data and our system.
A Review Study on the Privacy Preserving Data Mining Techniques and Approaches14894
In this paper we review on the
various privacy preserving data mining techniques like data
modification and secure multiparty computation based on the
different aspects.
Index Terms– Privacy and Security, Data Mining, Privacy
Preserving, Secure Multiparty Computation (SMC) and Data
Modification
Misusability Measure Based Sanitization of Big Data for Privacy Preserving Ma...IJECEIAES
Leakage and misuse of sensitive data is a challenging problem to enterprises. It has become more serious problem with the advent of cloud and big data. The rationale behind this is the increase in outsourcing of data to public cloud and publishing data for wider visibility. Therefore Privacy Preserving Data Publishing (PPDP), Privacy Preserving Data Mining (PPDM) and Privacy Preserving Distributed Data Mining (PPDM) are crucial in the contemporary era. PPDP and PPDM can protect privacy at data and process levels respectively. Therefore, with big data privacy to data became indispensable due to the fact that data is stored and processed in semi-trusted environment. In this paper we proposed a comprehensive methodology for effective sanitization of data based on misusability measure for preserving privacy to get rid of data leakage and misuse. We followed a hybrid approach that caters to the needs of privacy preserving MapReduce programming. We proposed an algorithm known as Misusability Measure-Based Privacy Preserving Algorithm (MMPP) which considers level of misusability prior to choosing and application of appropriate sanitization on big data. Our empirical study with Amazon EC2 and EMR revealed that the proposed methodology is useful in realizing privacy preserving Map Reduce programming.
Performance Analysis of Hybrid Approach for Privacy Preserving in Data Miningidescitation
Now-a day’s data sharing between two organizations
is common in many application areas like business planning
or marketing. When data are to be shared between parties,
there could be some sensitive data which should not be
disclosed to the other parties. Also medical records are more
sensitive so, privacy protection is taken more seriously. As
required by the Health Insurance Portability and
Accountability Act (HIPAA), it is necessary to protect the
privacy of patients and ensure the security of the medical
data. To address this problem, released datasets must be
modified unavoidably. We propose a method called Hybrid
approach for privacy preserving and implemented it. First we
randomized the original data. Then we have applied
generalization on randomized or modified data. This
technique protect private data with better accuracy, also it can
reconstruct original data and provide data with no information
loss, makes usability of data.
VOLUME-7 ISSUE-8, AUGUST 2019 , International Journal of Research in Advent Technology (IJRAT) , ISSN: 2321-9637 (Online) Published By: MG Aricent Pvt Ltd
Using Randomized Response Techniques for Privacy-Preserving Data Mining14894
Privacy is an important issue in data mining and knowledge
discovery. In this paper, we propose to use the randomized
response techniques to conduct the data mining computation.
Specially, we present a method to build decision tree
classifiers from the disguised data. We conduct experiments
to compare the accuracy ofou r decision tree with the one
built from the original undisguised data. Our results show
that although the data are disguised, our method can still
achieve fairly high accuracy. We also show how the parameter
used in the randomized response techniques affects the
accuracy ofth e results
Keywords
Privacy, security, decision tree, data mining
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
In this era, there are need to secure data in distributed database system. For collaborative data
publishing some anonymization techniques are available such as generalization and bucketization. We consider
the attack can call as “insider attack” by colluding data providers who may use their own records to infer
others records. To protect our database from these types of attacks we used slicing technique for anonymization,
as above techniques are not suitable for high dimensional data. It cause loss of data and also they need clear
separation of quasi identifier and sensitive database. We consider this threat and make several contributions.
First, we introduce a notion of data privacy and used slicing technique which shows that anonymized data
satisfies privacy and security of data which classifies data vertically and horizontally. Second, we present
verification algorithms which prove the security against number of providers of data and insure high utility and
data privacy of anonymized data with efficiency. For experimental result we use the hospital patient datasets
and suggest that our slicing approach achieves better or comparable utility and efficiency than baseline
algorithms while satisfying data security. Our experiment successfully demonstrates the difference between
computation time of encryption algorithm which is used to secure data and our system.
A Review Study on the Privacy Preserving Data Mining Techniques and Approaches14894
In this paper we review on the
various privacy preserving data mining techniques like data
modification and secure multiparty computation based on the
different aspects.
Index Terms– Privacy and Security, Data Mining, Privacy
Preserving, Secure Multiparty Computation (SMC) and Data
Modification
Misusability Measure Based Sanitization of Big Data for Privacy Preserving Ma...IJECEIAES
Leakage and misuse of sensitive data is a challenging problem to enterprises. It has become more serious problem with the advent of cloud and big data. The rationale behind this is the increase in outsourcing of data to public cloud and publishing data for wider visibility. Therefore Privacy Preserving Data Publishing (PPDP), Privacy Preserving Data Mining (PPDM) and Privacy Preserving Distributed Data Mining (PPDM) are crucial in the contemporary era. PPDP and PPDM can protect privacy at data and process levels respectively. Therefore, with big data privacy to data became indispensable due to the fact that data is stored and processed in semi-trusted environment. In this paper we proposed a comprehensive methodology for effective sanitization of data based on misusability measure for preserving privacy to get rid of data leakage and misuse. We followed a hybrid approach that caters to the needs of privacy preserving MapReduce programming. We proposed an algorithm known as Misusability Measure-Based Privacy Preserving Algorithm (MMPP) which considers level of misusability prior to choosing and application of appropriate sanitization on big data. Our empirical study with Amazon EC2 and EMR revealed that the proposed methodology is useful in realizing privacy preserving Map Reduce programming.
Several anonymization techniques, such as generalization and bucketization, have been designed for privacy preserving microdata publishing. Recent work has shown that generalization loses considerable amount of information, especially for high-dimensional data. Bucketization, on the other hand, does not prevent membership disclosure and does not apply for data that do not have a clear separation between quasi-identifying attributes and sensitive attributes. In this paper, we present a novel technique called slicing, which partitions the data both horizontally and vertically. We show that slicing preserves better data utility than generalization and can be used for membership disclosure protection. Another important advantage of slicing is that it can handle high-dimensional data. We show how slicing can be used for attribute disclosure protection and develop an efficient algorithm for computing the sliced data that obey the ℓ-diversity requirement. Our workload experiments confirm that slicing preserves better utility than generalization and is more effective than bucketization in workloads involving the sensitive attribute. Our experiments also demonstrate that slicing can be used to prevent membership disclosure.
Cluster Based Access Privilege Management Scheme for DatabasesEditor IJMTER
Knowledge discovery is carried out using the data mining techniques. Association rule mining,
classification and clustering operations are carried out under data mining. Clustering method is used to group up the
records based on the relevancy. Distance or similarity measures are used to estimate the transaction relationship.
Census data and medical data are referred as micro data. Data publish schemes are used to provide private data for
analysis. Privacy preservation is used to protect private data values. Anonymity is considered in the privacy
preservation process.
Data values are allowed to authorized users using the access control models. Privacy Protection Mechanism
(PPM) uses suppression and generalization of relational data to anonymize and satisfy privacy needs. Accuracyconstrained privacy-preserving access control framework is used to manage access control in relational database. The
access control policies define selection predicates available to roles while the privacy requirement is to satisfy the kanonymity or l-diversity. Imprecision bound constraint is assigned for each selection predicate. k-anonymous
Partitioning with Imprecision Bounds (k-PIB) is used to estimate accuracy and privacy constraints. Role-based Access
Control (RBAC) allows defining permissions on objects based on roles in an organization. Top Down Selection
Mondrian (TDSM) algorithm is used for query workload-based anonymization. The Top Down Selection Mondrian
(TDSM) algorithm is constructed using greedy heuristics and kd-tree model. Query cuts are selected with minimum
bounds in Top-Down Heuristic 1 algorithm (TDH1). The query bounds are updated as the partitions are added to the
output in Top-Down Heuristic 2 algorithm (TDH2). The cost of reduced precision in the query results is used in TopDown Heuristic 3 algorithm (TDH3). Repartitioning algorithm is used to reduce the total imprecision for the queries.
The privacy preserved access privilege management scheme is enhanced to provide incremental mining
features. Data insert, delete and update operations are connected with the partition management mechanism. Cell level
access control is provided with differential privacy method. Dynamic role management model is integrated with the
access control policy mechanism for query predicates.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data PrivacyKato Mivule
Genomic data provides clinical researchers with vast opportunities to study various patient ailments. Yet the same data contains revealing information, some of which a patient might want to remain concealed. The question then arises: how can an entity transact in full DNA data while concealing certain sensitive pieces of information in the genome sequence, and maintain DNA data utility? As a response to this question, we propose a codon frequency obfuscation heuristic, in which a redistribution of codon frequency values with highly expressed genes is done in the same amino acid group, generating an obfuscated DNA sequence. Our preliminary results show that it might be possible to publish an obfuscated DNA sequence with a desired level of similarity (utility) to the original DNA sequence. http://arxiv.org/abs/1405.5410
A Survey: Privacy Preserving Using Obfuscated Attribute In e-Health Cloudrahulmonikasharma
Cloud computing now a day’s provides numerous number of benefits to their users. As the Cloud infrastructure is not directly under control of user its seems to be difficult for user to have a better security. Other side as the number of user grow even it become more difficult to manage a data such a way that user needs for any data are satisfied efficiently. There are lots of chances to misuse the data of user. So, here Cloud providers need to balance this two fundamental of Privacy handling and efficient analysis of data together is become very important. When we talk about the health records of patient or medical firm and available on remote machine issue of privacy of record provided by the anonymization fundamental. Here various researcher provided a technique T- Closeness to achieve this goal. It also important to provide the security of stored data using obfuscation mechanism . Some time full obfuscation of file consume more time so many researcher provided scheme of attribute based obfuscation which lessen the burden of Cloud server by providing adequate security and also help to execute user query faster. In this paper we aim to provide survey on various fundamental given by the different researcher.
Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...cscpconf
In today’s world, gigantic amount of data is available in science, industry, business and many
other areas. This data can provide valuable information which can be used by management for
making important decisions. But problem is that how can find valuable information. The answer
is data mining. Data Mining is popular topic among researchers. There is lot of work that
cannot be explored till now. But, this paper focuses on the fundamental concept of the Data mining i.e. Classification Techniques. In this paper BayesNet, NavieBayes, NavieBayes Uptable, Multilayer perceptron, Voted perceptron and J48 classifiers are used for the classification of data set. The performance of these classifiers analyzed with the help of Mean Absolute Error, Root Mean-Squared Error and Time Taken to build the model and the result can be shown statistical as well as graphically. For this purpose the WEKA data mining tool is used.
An Investigation of Data Privacy and Utility Preservation Using KNN Classific...Kato Mivule
Kato Mivule and Claude Turner, An Investigation of Data Privacy and Utility Preservation Using KNN Classification as a Gauge, International Conference on Information and Knowledge Engineering (IKE 2013), July 22-25, Pages 203-204, Las Vegas, NV, USA
PRIVACY PRESERVING CLUSTERING IN DATA MINING USING VQ CODE BOOK GENERATIONcscpconf
Huge Volumes of detailed personal data is regularly collected and analyzed by applications
using data mining, sharing of these data is beneficial to the application users. On one hand it is
an important asset to business organizations and governments for decision making at the same
time analysing such data opens treats to privacy if not done properly. This paper aims to reveal
the information by protecting sensitive data. We are using Vector quantization technique for
preserving privacy. Quantization will be performed on training data samples it will produce
transformed data set. This transformed data set does not reveal the original data. Hence privacy
is preserved
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...Kato Mivule
Kato Mivule, Claude Turner, "A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Using Machine Learning Classification as a Gauge", Procedia Computer Science, Volume 20, 2013, Pages 414-419, Baltimore MD, USA
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Kato Mivule - Utilizing Noise Addition for Data Privacy, an OverviewKato Mivule
Kato Mivule, "Utilizing Noise Addition for Data Privacy, an Overview", Proceedings of the International Conference on Information and Knowledge Engineering (IKE 2012), Pages 65-71, Las Vegas, NV, USA.
A Rule based Slicing Approach to Achieve Data Publishing and Privacyijsrd.com
several anonymization techniques, such as generalization and bucketization, have been designed for privacy preserving micro data publishing. Recent work has shown that generalization loses considerable amount of information, especially for high dimensional data. Bucketization, on the other hand, does not prevent membership disclosure and does not apply for data that do not have a clear separation between quasi-identifying attributes and sensitive attributes. The existing system proposed slicing concept to overcome the tuple based partition this has been done to overcome the previous generalization and bucketization. In this paper, present a novel technique called rule based slicing, which partitions the data both horizontally and vertically. We show that slicing preserves better data utility than generalization and can be used for membership disclosure protection. Another important advantage of slicing is that it can handle high-dimensional data. We show how slicing can be used for attribute disclosure protection and develop an efficient algorithm for computing the sliced data that obey the l-diversity requirement. The workload experiments confirm that slicing preserves better utility than generalization and is more effective than bucketization in workloads involving the sensitive attribute. The experiments also demonstrate that slicing can be used to prevent membership disclosure
Several anonymization techniques, such as generalization and bucketization, have been designed for privacy preserving microdata publishing. Recent work has shown that generalization loses considerable amount of information, especially for high-dimensional data. Bucketization, on the other hand, does not prevent membership disclosure and does not apply for data that do not have a clear separation between quasi-identifying attributes and sensitive attributes. In this paper, we present a novel technique called slicing, which partitions the data both horizontally and vertically. We show that slicing preserves better data utility than generalization and can be used for membership disclosure protection. Another important advantage of slicing is that it can handle high-dimensional data. We show how slicing can be used for attribute disclosure protection and develop an efficient algorithm for computing the sliced data that obey the ℓ-diversity requirement. Our workload experiments confirm that slicing preserves better utility than generalization and is more effective than bucketization in workloads involving the sensitive attribute. Our experiments also demonstrate that slicing can be used to prevent membership disclosure.
Cluster Based Access Privilege Management Scheme for DatabasesEditor IJMTER
Knowledge discovery is carried out using the data mining techniques. Association rule mining,
classification and clustering operations are carried out under data mining. Clustering method is used to group up the
records based on the relevancy. Distance or similarity measures are used to estimate the transaction relationship.
Census data and medical data are referred as micro data. Data publish schemes are used to provide private data for
analysis. Privacy preservation is used to protect private data values. Anonymity is considered in the privacy
preservation process.
Data values are allowed to authorized users using the access control models. Privacy Protection Mechanism
(PPM) uses suppression and generalization of relational data to anonymize and satisfy privacy needs. Accuracyconstrained privacy-preserving access control framework is used to manage access control in relational database. The
access control policies define selection predicates available to roles while the privacy requirement is to satisfy the kanonymity or l-diversity. Imprecision bound constraint is assigned for each selection predicate. k-anonymous
Partitioning with Imprecision Bounds (k-PIB) is used to estimate accuracy and privacy constraints. Role-based Access
Control (RBAC) allows defining permissions on objects based on roles in an organization. Top Down Selection
Mondrian (TDSM) algorithm is used for query workload-based anonymization. The Top Down Selection Mondrian
(TDSM) algorithm is constructed using greedy heuristics and kd-tree model. Query cuts are selected with minimum
bounds in Top-Down Heuristic 1 algorithm (TDH1). The query bounds are updated as the partitions are added to the
output in Top-Down Heuristic 2 algorithm (TDH2). The cost of reduced precision in the query results is used in TopDown Heuristic 3 algorithm (TDH3). Repartitioning algorithm is used to reduce the total imprecision for the queries.
The privacy preserved access privilege management scheme is enhanced to provide incremental mining
features. Data insert, delete and update operations are connected with the partition management mechanism. Cell level
access control is provided with differential privacy method. Dynamic role management model is integrated with the
access control policy mechanism for query predicates.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data PrivacyKato Mivule
Genomic data provides clinical researchers with vast opportunities to study various patient ailments. Yet the same data contains revealing information, some of which a patient might want to remain concealed. The question then arises: how can an entity transact in full DNA data while concealing certain sensitive pieces of information in the genome sequence, and maintain DNA data utility? As a response to this question, we propose a codon frequency obfuscation heuristic, in which a redistribution of codon frequency values with highly expressed genes is done in the same amino acid group, generating an obfuscated DNA sequence. Our preliminary results show that it might be possible to publish an obfuscated DNA sequence with a desired level of similarity (utility) to the original DNA sequence. http://arxiv.org/abs/1405.5410
A Survey: Privacy Preserving Using Obfuscated Attribute In e-Health Cloudrahulmonikasharma
Cloud computing now a day’s provides numerous number of benefits to their users. As the Cloud infrastructure is not directly under control of user its seems to be difficult for user to have a better security. Other side as the number of user grow even it become more difficult to manage a data such a way that user needs for any data are satisfied efficiently. There are lots of chances to misuse the data of user. So, here Cloud providers need to balance this two fundamental of Privacy handling and efficient analysis of data together is become very important. When we talk about the health records of patient or medical firm and available on remote machine issue of privacy of record provided by the anonymization fundamental. Here various researcher provided a technique T- Closeness to achieve this goal. It also important to provide the security of stored data using obfuscation mechanism . Some time full obfuscation of file consume more time so many researcher provided scheme of attribute based obfuscation which lessen the burden of Cloud server by providing adequate security and also help to execute user query faster. In this paper we aim to provide survey on various fundamental given by the different researcher.
Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...cscpconf
In today’s world, gigantic amount of data is available in science, industry, business and many
other areas. This data can provide valuable information which can be used by management for
making important decisions. But problem is that how can find valuable information. The answer
is data mining. Data Mining is popular topic among researchers. There is lot of work that
cannot be explored till now. But, this paper focuses on the fundamental concept of the Data mining i.e. Classification Techniques. In this paper BayesNet, NavieBayes, NavieBayes Uptable, Multilayer perceptron, Voted perceptron and J48 classifiers are used for the classification of data set. The performance of these classifiers analyzed with the help of Mean Absolute Error, Root Mean-Squared Error and Time Taken to build the model and the result can be shown statistical as well as graphically. For this purpose the WEKA data mining tool is used.
An Investigation of Data Privacy and Utility Preservation Using KNN Classific...Kato Mivule
Kato Mivule and Claude Turner, An Investigation of Data Privacy and Utility Preservation Using KNN Classification as a Gauge, International Conference on Information and Knowledge Engineering (IKE 2013), July 22-25, Pages 203-204, Las Vegas, NV, USA
PRIVACY PRESERVING CLUSTERING IN DATA MINING USING VQ CODE BOOK GENERATIONcscpconf
Huge Volumes of detailed personal data is regularly collected and analyzed by applications
using data mining, sharing of these data is beneficial to the application users. On one hand it is
an important asset to business organizations and governments for decision making at the same
time analysing such data opens treats to privacy if not done properly. This paper aims to reveal
the information by protecting sensitive data. We are using Vector quantization technique for
preserving privacy. Quantization will be performed on training data samples it will produce
transformed data set. This transformed data set does not reveal the original data. Hence privacy
is preserved
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...Kato Mivule
Kato Mivule, Claude Turner, "A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Using Machine Learning Classification as a Gauge", Procedia Computer Science, Volume 20, 2013, Pages 414-419, Baltimore MD, USA
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Kato Mivule - Utilizing Noise Addition for Data Privacy, an OverviewKato Mivule
Kato Mivule, "Utilizing Noise Addition for Data Privacy, an Overview", Proceedings of the International Conference on Information and Knowledge Engineering (IKE 2012), Pages 65-71, Las Vegas, NV, USA.
A Rule based Slicing Approach to Achieve Data Publishing and Privacyijsrd.com
several anonymization techniques, such as generalization and bucketization, have been designed for privacy preserving micro data publishing. Recent work has shown that generalization loses considerable amount of information, especially for high dimensional data. Bucketization, on the other hand, does not prevent membership disclosure and does not apply for data that do not have a clear separation between quasi-identifying attributes and sensitive attributes. The existing system proposed slicing concept to overcome the tuple based partition this has been done to overcome the previous generalization and bucketization. In this paper, present a novel technique called rule based slicing, which partitions the data both horizontally and vertically. We show that slicing preserves better data utility than generalization and can be used for membership disclosure protection. Another important advantage of slicing is that it can handle high-dimensional data. We show how slicing can be used for attribute disclosure protection and develop an efficient algorithm for computing the sliced data that obey the l-diversity requirement. The workload experiments confirm that slicing preserves better utility than generalization and is more effective than bucketization in workloads involving the sensitive attribute. The experiments also demonstrate that slicing can be used to prevent membership disclosure
Data Transformation Technique for Protecting Private Information in Privacy P...acijjournal
Data mining is the process of extracting patterns from data. Data mining is seen as an increasingly important tool by modern business to transform data into an informational advantage. Data
Mining can be utilized in any organization that needs to find patterns or relationships in their data. A group of techniques that find relationships that have not previously been discovered. In many situations, the extracted patterns are highly private and it should not be disclosed. In order to maintain the secrecy of data,
there is in need of several techniques and algorithms for modifying the original data in order to limit the extraction of confidential patterns. There have been two types of privacy in data mining. The first type of privacy is that the data is altered so that the mining result will preserve certain privacy. The second type of privacy is that the data is manipulated so that the mining result is not affected or minimally affected. The aim of privacy preserving data mining researchers is to develop data mining techniques that could be
applied on data bases without violating the privacy of individuals. Many techniques for privacy preserving data mining have come up over the last decade. Some of them are statistical, cryptographic, randomization methods, k-anonymity model, l-diversity and etc. In this work, we propose a new perturbative masking technique known as data transformation technique can be used for protecting the sensitive information. An
experimental result shows that the proposed technique gives the better result compared with the existing technique.
Anonymization techniques are used to ensure the privacy preservation of the data owners, especially for personal and sensitive data. While in most cases, data reside inside the database management system; most of the proposed anonymization techniques operate on and anonymize isolated datasets stored outside the DBMS. Hence, most of the desired functionalities of the DBMS are lost, e.g., consistency, recoverability, and efficient querying. In this paper, we address the challenges involved in enforcing the data privacy inside the DBMS. We implement the k-anonymity algorithm as a relational operator that interacts with other query operators to apply the privacy requirements while querying the data. We study anonymizing a single table, multiple tables, and complex queries that involve multiple predicates. We propose several algorithms to implement the anonymization operator that allow efficient non-blocking and pipelined execution of the query plan. We introduce the concept of k-anonymity view as an abstraction to treat k-anonymity (possibly, with multiple k preferences) as a relational view over the base table(s). For non-static datasets, we introduce the materialized k-anonymity views to ensure preserving the privacy under incremental updates. A prototype system is realized based on PostgreSQL with extended SQL and new relational operators to support anonymity views. The prototype system demonstrates how anonymity views integrate with other privacy- preserving components, e.g., limited retention, limited disclosure, and privacy policy management. Our experiments, on both synthetic and real datasets, illustrate the performance gain from the anonymity views as well as the proposed query optimization techniques under various scenarios.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Data Anonymization Process Challenges and Context Missionsijdms
Data anonymization is one of the solutions allowing companies to comply with the GDPR directive in terms of data protection. In this context, developers must follow several steps in the process of data anonymization in development and testing environments. Indeed, real personal and sensitive data must not leave the production environment which is very secure. Often, anonymization experts are faced with difficulties including the lack of data flows and mapping between data sources, the non-cooperation of the database project teams (refusal to change) or even the lack of skills of these teams present due to the age of the systems developed by experienced teams who unfortunately left the project. Other problems are lack of data models. The aim of this paper is to discuss an anonymization process of databases of banking applications and present our context-based recommendations to overcome the different issues met and the solutions to improve methodologies of data anonymization process.
A Comparative Study on Privacy Preserving Datamining TechniquesIJMER
Privacy protection is very important in the recent years for the reason of increasing in the
ability to store data. In particular, recent advances in the data mining field have lead to increased
concerns about privacy. Data in its original form, however, typically contains sensitive information about
individuals, and publishing such data will violate individual privacy. The current practice in data
publishing based on that what type of data can be released and use of that data. Recently, PPDM has
received immersed attention in research communities, and many approaches have been proposed for
different data publishing scenarios. In this comparative study we will systematically summarize and
evaluate different approaches for PPDM, study the challenges ,differences and requirements that
distinguish PPDM from other related problems, and propose future research directions
AN EFFICIENT SOLUTION FOR PRIVACYPRESERVING, SECURE REMOTE ACCESS TO SENSITIV...cscpconf
Sharing data that contains personally identifiable or sensitive information, such as medical
records, always has privacy and security implications. The issues can become rather complex
when the methods of access can vary, and accurate individual data needs to be provided whilst
mass data release for specific purposes (for example for medical research) also has to be
catered for. Although various solutions have been proposed to address the different aspects
individually, a comprehensive approach is highly desirable. This paper presents a solution for
maintaining the privacy of data released en masse in a controlled manner, and for providing
secure access to the original data for authorized users. The results show that the solution is provably secure and maintains privacy in a more efficient manner than previous solutions
A survey on privacy preserving data publishingijcisjournal
Data mining is a computational process of analysing and extracting the data from large useful datasets. In
recent years, exchanging and publishing data has been common for their wealth of opportunities. Security,
Privacy and data integrity are considered as challenging problems in data
mining.Privacy is necessary to protect people’s interest in competitive situations. Privacy is an abilityto
create and maintain different sort of social relationships with people. Privacy Preservation is one of the
most important factor for an individual since he should not embarrassed by an adversary. The Privacy
Preservation is an important aspect of data mining to ensure the privacy by various methods. Privacy
Preservation is necessary to protect sensitive information associated with individual. This paper provides a
survey of key to success and an approach where individual’s privacy would to be non-distracted.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
A Survey Paper on an Integrated Approach for Privacy Preserving In High Dimen...IJSRD
Data mining is a technique which is used for extraction of knowledge and information from large amount of data collected by hospitals, government and individuals. The term data mining is also referred as knowledge mining from databases. The major challenge in data mining is ensuring security and privacy of data in databases, because data sharing is common at organizational level. The data in databases comes from a number of sources like – medical, financial, library, marketing, shopping record etc so it is foremost task for anyone to keep secure that data. The objective is to achieve fully privacy preserved data without affecting the data utility in databases. i.e. how data is used or transferred between organizations so that data integrity remains in database but sensitive and confidential data is preserved. This paper presents a brief study about different PPDM techniques like- Randomization, perturbation, Slicing, summarization etc. by use of which the data privacy can be preserved. The technique for which the best computational and theoretical outcome is achieved is chosen for privacy preserving in high dimensional data.
Privacy Preservation and Restoration of Data Using Unrealized Data SetsIJERA Editor
In today’s world, there is an improved advance in hardware technology which increases the capability to store and record personal data about consumers and individuals. Data mining extracts knowledge to support a variety of areas as marketing, medical diagnosis, weather forecasting, national security etc successfully. Still there is a challenge to extract certain kinds of data without violating the data owners’ privacy. As data mining becomes more enveloping, such privacy concerns are increasing. This gives birth to a new category of data mining method called privacy preserving data mining algorithm (PPDM). The aim of this algorithm is to protect the easily affected information in data from the large amount of data set. The privacy preservation of data set can be expressed in the form of decision tree. This paper proposes a privacy preservation based on data set complement algorithms which store the information of the real dataset. So that the private data can be safe from the unauthorized party, if some portion of the data can be lost, then we can recreate the original data set from the unrealized dataset and the perturbed data set.
https://utilitasmathematica.com/index.php/Index
Our journal has academic and professional communities fosters collaboration and knowledge sharing. When all voices are heard and respected, it strengthens the collective capabilities of the statistical community.
Two-Phase TDS Approach for Data Anonymization To Preserving Bigdata Privacydbpublications
While Big Data gradually become a hot topic of research and business and has been everywhere used in many industries, Big Data security and privacy has been increasingly concerned. However, there is an obvious contradiction between Big Data security and privacy and the widespread use of Big Data. There have been a various different privacy preserving mechanisms developed for protecting privacy at different stages (e.g. data generation, data storage, data processing) of big data life cycle. The goal of this paper is to provide a complete overview of the privacy preservation mechanisms in big data and present the challenges for existing mechanisms and also we illustrate the infrastructure of big data and state-of-the-art privacy-preserving mechanisms in each stage of the big data life cycle. This paper focus on the anonymization process, which significantly improve the scalability and efficiency of TDS (top-down-specialization) for data anonymization over existing approaches. Also, we discuss the challenges and future research directions related to preserving privacy in big data.
Similar to Enabling Use of Dynamic Anonymization for Enhanced Security in Cloud (20)
Literature Review Basics and Understanding Reference Management.pptxDr Ramhari Poudyal
Three-day training on academic research focuses on analytical tools at United Technical College, supported by the University Grant Commission, Nepal. 24-26 May 2024
Understanding Inductive Bias in Machine LearningSUTEJAS
This presentation explores the concept of inductive bias in machine learning. It explains how algorithms come with built-in assumptions and preferences that guide the learning process. You'll learn about the different types of inductive bias and how they can impact the performance and generalizability of machine learning models.
The presentation also covers the positive and negative aspects of inductive bias, along with strategies for mitigating potential drawbacks. We'll explore examples of how bias manifests in algorithms like neural networks and decision trees.
By understanding inductive bias, you can gain valuable insights into how machine learning models work and make informed decisions when building and deploying them.
6th International Conference on Machine Learning & Applications (CMLA 2024)ClaraZara1
6th International Conference on Machine Learning & Applications (CMLA 2024) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of on Machine Learning & Applications.
TOP 10 B TECH COLLEGES IN JAIPUR 2024.pptxnikitacareer3
Looking for the best engineering colleges in Jaipur for 2024?
Check out our list of the top 10 B.Tech colleges to help you make the right choice for your future career!
1) MNIT
2) MANIPAL UNIV
3) LNMIIT
4) NIMS UNIV
5) JECRC
6) VIVEKANANDA GLOBAL UNIV
7) BIT JAIPUR
8) APEX UNIV
9) AMITY UNIV.
10) JNU
TO KNOW MORE ABOUT COLLEGES, FEES AND PLACEMENT, WATCH THE FULL VIDEO GIVEN BELOW ON "TOP 10 B TECH COLLEGES IN JAIPUR"
https://www.youtube.com/watch?v=vSNje0MBh7g
VISIT CAREER MANTRA PORTAL TO KNOW MORE ABOUT COLLEGES/UNIVERSITITES in Jaipur:
https://careermantra.net/colleges/3378/Jaipur/b-tech
Get all the information you need to plan your next steps in your medical career with Career Mantra!
https://careermantra.net/
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
We have compiled the most important slides from each speaker's presentation. This year’s compilation, available for free, captures the key insights and contributions shared during the DfMAy 2024 conference.
A review on techniques and modelling methodologies used for checking electrom...nooriasukmaningtyas
The proper function of the integrated circuit (IC) in an inhibiting electromagnetic environment has always been a serious concern throughout the decades of revolution in the world of electronics, from disjunct devices to today’s integrated circuit technology, where billions of transistors are combined on a single chip. The automotive industry and smart vehicles in particular, are confronting design issues such as being prone to electromagnetic interference (EMI). Electronic control devices calculate incorrect outputs because of EMI and sensors give misleading values which can prove fatal in case of automotives. In this paper, the authors have non exhaustively tried to review research work concerned with the investigation of EMI in ICs and prediction of this EMI using various modelling methodologies and measurement setups.
A review on techniques and modelling methodologies used for checking electrom...
Enabling Use of Dynamic Anonymization for Enhanced Security in Cloud
1. IOSR Journal of Computer Engineering (IOSR-JCE)
e-ISSN: 2278-0661, p- ISSN: 2278-8727Volume 12, Issue 3 (Jul. - Aug. 2013), PP 44-51
www.iosrjournals.org
www.iosrjournals.org 44 | Page
Enabling Use of Dynamic Anonymization for Enhanced Security
in Cloud
Ms. Swati Ganar*1
, Apeksha Sakhare*2
Department of Computer Science & Engineeing
G.H.Raisoni College of Engineering, Nagpur
Abstract: Cloud computing is a model that enables Convenient and On-demand network access to a shared
pool of configurable computing resources where millions of users share an infrastructure. Privacy and Security
are significant obstacle that is preventing the extensive adoption of the public cloud in the Industry. Researchers
have developed privacy models such as k-anonymity, l-diversity, t-closeness. However, even though these
privacy models are applied, an attacker may still be able to access some confidential data if same sensitive
labels are used by a group of nodes. Publishing data about individuals without revealing sensitive information
about them is an important problem. Data Anonymization is a method that makes data worthless to anyone
except the owner of the data. It is one of the methods for transforming the data that it prevents identification of
key information from an unauthorized person. We survey the existing methods of anonymization to protect
sensitive information stored in cloud. Data can also be anonymized by using techniques such as, Hashing,
Hiding, Permutation, Shifting, Truncation, Prefix-preserving, Enumeration, etc. We have implemented these
methods also to see an anonymization effect and implemented a new method for anonymization.
Keywords: Anonymization, Deanonymization, Data Hiding, Hash calculation, Data Shifting, Data Truncation,
Data Enumeration, Data Permutation, IP prefix Preserving.
I. INTRODUCTION
Cloud computing is a model that enables Convenient and On-demand network access to a shared pool of
configurable computing resources where millions of users share an infrastructure. It offers many potential
benefits to small and medium-sized enterprises (SMEs). It provides many services for
data processing
storage and backup
facilitate productivity
accounting services
communications
Customer service and support.
Cloud computing is immune to security breaches, because it does not facilitate backup media, unsecured
connection to hijack or eavesdrop.
But, the question of privacy or confidentiality arises whenever a user shares information in the cloud. Public or
Private organizations publish their database on to the cloud for
Research purpose or some other purpose. This database contains sensitive information about many people. It is
an information resource for research, analysis purpose. This database may help the Hospital to track its patients,
a School to monitor its students or a Bank its customers. The privacy of this data must be preserved while
disclosing it to third party or while placing it in long time storage. i.e. any sensitive information should not be
disclosed. To reduce or eliminate the privacy risk, a method called Anonymization is used.
Anonymization is one of the privacy preserving techniques that manipulate the information, making the data
identification difficult to anybody except the owners [1]. It is different from that of data encryption.
Anonymization of data removes identifying attributes like names or social security numbers from the database.
For example, the school will delete student ID and Bank will remove account number.
Anonymization has 3 primary goals [2]:
To protect identities of specific user from being leaked
To protect identities internal user from being revealed
To protect specific security practices of organizations from being revealed.
Submitted Date 10 June 2013 Accepted Date: 15 June 2013
2. Enabling Use Of Dynamic Anonymization For Enhanced Security In Cloud
www.iosrjournals.org 45 | Page
Experts have developed different anonymization techniques, varying in their cost, complexity, ease of use, and
robustness to achieve these goals [2][3]. Suppression [4] is very common method for anonymization. It is
performed by deleting or omitting the data entirely. For example, an administrator in hospital tracking
prescriptions will suppress patient‟s names before sharing data. In order to protect the sensitive values,
Generalization [4] techniques can also be used. This technique replaces quasi identifier attributes with less
specific values. It divides the tuples into quasi identifier groups (QI groups), and generalise values in every
group to uniform format. For example, the data in microdata table is generalized using K-Anonymization
technique. To effectively limit information disclosure, it is necessary to measure the disclosure risk of
anonymized table.
Different techniques are required to anonymize qualitative and quantitive information. Some methods are as
follows:
Removing individual‟s name from document
Blurring images to disguise face
Modifying or re-recording audio files
Modification in reports
A simple example of data anonymization is given below: the aim is to find turnover of some
companies, whose names are kept secret. For this purpose, name of companies are changed in cloud based data.
At the same time, some fictitious information is also added to cloud based data. Then a secure mapping table is
generated to identify original and fictitious data. When the total turnover is calculated in cloud, the result
achieved is incorrect. This incorrect result is then corrected by using secure mapping table [1].
The Anonymization procedure can be reversed and termed as Reidentification or Deanonymization. An
adversary links the anonymized records to outside data, and tries to reidentify anonymized data.
Re-identification can be done in 2 ways:
Adversary takes personal data and searches an anonymized dataset for a match.
Adversary takes a record from an anonymized dataset and searches for a match in publicly available
information.
The rest of this paper is organized as follows: Definitions are given in Section II. Section III reviews
the related work, Section IV contains Problem definition. Anonymization techniques are presented in section V.
Section VI consists of Methodology and Results. Finally, Section VII concludes the paper and gives suggestion
for future work.
Fig.1:Data Anonymization in Cloud
II. DEFINITIONS
The database also called microdata is stored in a table which has multiple records. These records may be
categorized as follows:
Explicit identifiers
Quasi identifiers
Sensitive identifiers.
3. Enabling Use Of Dynamic Anonymization For Enhanced Security In Cloud
www.iosrjournals.org 46 | Page
Explicit identifiers are the attributes which identifies an individual. For eg: Name, social security
number etc.
Quasi identifiers are the attributes which can be linked with other information to identify an individual from
population. For eg: gender, birth-date, zip code, diagnosis, etc.
Sensitive identifier is the attribute with sensitive value. Here the value of the attribute is not discovered to any
individual.
Fig. 2: Anonymization techniques
Above hierarchy in fig.2 shows the different techniques that are used for anonymization.
The sensitive information is being protected by using variety of techniques, including Data Swapping, Data
Perturbation, Data recoding, Data Suppression, Data Aggregation, Data Generalization, Sampling and
Rounding.
Most of the anonymization work has been done on static datasets. But, the real datasets are dynamic.
So, dynamic anonymization is required. The dynamic datasets are made complex by using data updates. Above
hierarchy shows the different techniques that are used for anonymization.
Data updates can be either external or internal [5]. External update leads to update of records in dataset.
Whereas internal update leads to update of records attribute value.
There is always a correlation between old value and new value of a record. For example, a person‟s current
salary in one particular organization is 4.5 lakhs per annum. After several years, even if we cannot determine
her/his highest salary without complementary knowledge, we can conclude that it will not be lower than 4.5
lakhs per annum and will be one of {6 lakhs, 8 lakhs, more than 8 lakhs} with different nonzero probabilities.
III. RELATED WORK
Many techniques are available to anonymize the data. Some security models were also be used to
improve data anonymization, such as k-anonymity, l-diversity, t-closeness etc.
Samarati [6] and Sweeney [7] introduced k-anonymity as the property that each record is
indistinguishable from a defined number (k) if attempts are made to identify the data. For any data record with a
set of attribute values, if there are at least k-1 other records that match those attribute values then, the dataset is
said to be k-anonymized. K-anonymity can prevent only identity disclosure; it cannot prevent disclosure of
attribute information.
Machanavajjhala et al. [8] introduced a new model, called l-diversity, which requires that there are „l‟
different sensitive values for each combination of quasi identifiers. An equivalence class is said to have l-
diversity if there are at least l “well-represented” values for the sensitive attribute. A table is said to have l-
diversity if every equivalence class of the table has l-diversity.
Similar to k-anonymity, l-diversity does not prevent attribute disclosure. And there are some attack that
may occur on l-diversity such as, Skewness attack and Similarity attack. The information leakage occurs in l-
diversity because it does not consider semantical closeness of sensitive values.
Ninghui Li, Tiancheng Li and Suresh Venkatasubramanian[5][9] proposed a new privacy model known
as t-closeness which requires that the distribution of a sensitive attribute in any equivalence class is close to the
distribution of the attribute in the overall table (i.e. the distance between the two distributions should be no more
than a threshold t). t-closeness uses Earth Mover Distance (EMD) to calculate the distance between two
distributions [2]. And it also considers semantic closeness of attribute values. EMD can be calculated by using
the solution of transportation problem. t-closeness prevents attribute disclosure but it cannot protect the dataset
against identity disclosure. In [5], authors have used the Mondrian algorithm in which high-dimensional space is
divided into regions and data points are encoded in one particular region by the region‟s representation.
To protect privacy of the database, some other techniques were utilized. These techniques are given below [10]:
Removing identifying information: This is the simplest method of anonymization. Here, the field
which is used to identify a specific individual is removed. The field to be removed may be Name, ID or some
other field that is highly sensitive in context of data [11]. Depending on the context, which field is to be
4. Enabling Use Of Dynamic Anonymization For Enhanced Security In Cloud
www.iosrjournals.org 47 | Page
removed is decided. For example, Patient name is removed from Hospital database because this field gives
identification of many persons. Fig.3 displays an original Hospital database. Whereas fig.4 gives an anonymized
version.
Fig.3 : Original database
Removing a particular record from a database also gives good protection for sensitive data.
Fig.4 : Removing Id field
Suppression: Suppression consists of replacing value of variables with missing value. Or removing the
fields. The aim of this method is to reduce the information content. In [12], depending on the violation of
sensitive attribute, four different Suppression schemes have been suggested as follows:
o Delete all violating sensitive values and replace them with unknown value
o Delete all sensitive values
o Delete minimum number of records which violate sensitive value
o Delete all records
Suppression of 3 columns is shown in fig.5.
Fig.5: Suppressing 3 fields
Generalization: This technique replaces quasi identifier attributes with less specific values. The
rational of generalization is to partition the tuple into several Quasi-identifier fields, and generalize them in a
uniform format. Privacy of data is preserved in generalized table iff it satisfies Generalization principle. For
example, Birth date may be generalized to year of birth only. Fig.6 shows generalization of Zip code.
5. Enabling Use Of Dynamic Anonymization For Enhanced Security In Cloud
www.iosrjournals.org 48 | Page
Fig.6: Generalized database (Zip field)
Aggregation: This method gives aggregate statistics of database or field. Only summary statistics is
given using aggregation. For example, it is possible to know that how many persons are suffering from flu using
aggregation. Fig.7 gives an aggregated value from database.
Fig.7: Aggregated database
IV. PROBLEM DEFINITION
The data is often not protected when used. Regardless of where the data is stored or transferred, Data
security is important. The data may be a raw data which contains more sensitive information. Most of the cloud
users are worried since the cloud service provider may sell their data to another provider. In that case, some
sensitive data of cloud users may be disclosed. Similarly, cloud service providers may use the data (images) for
advertisement.
Given an anonymization, adding a new value performs specialization on the data, whereas removing
some value performs generalization. Here, the aim is not just to anonymize the data, but to achieve a good
anonymization with respect to its cost, complexity and robustness. To achieve this aim, we have implemented
following techniques of anonymization and also proposed new technique for anonymization.
V. ANONYMIZATION TECHNIQUES
Different vulnerabilities are associated with different types of anonymizations. There are several
techniques available to anonymize the data, such as encryption, substitution, shuffling, number and date
variance and nulling some fields. We have implemented some anonymization techniques to obscure data in
database.
1. DATA HIDING:
It suppresses a data value by replacing it with a value „0‟. It is also called as Black marker anonymization.
The advantage of hiding a record is that number of records is maintained after anonymization also. For example,
while considering hospital database, an age of a patient may not be required for processing, so it is replaced with
constant „0‟.
2. Hash Calculation:
It finds a hash value of either one field or several fields. It takes a variable input and produces fixed
size hash of input. The MD5 or SHA can be used. For example, hash of first name and last name can be
calculated. The hash function is defined as,
H(S) = H(“S1S2…. Sn”) = S1 + pS2 + p2
S3 + ….+ pn-1
Sn
H(T) = H(“T1T2…. Tn”) = T1 + pT2 + p2
T3 + ….+ pn-1
Tn
Where,
S=first name , T=last name
P=prime number used for multiplication
6. Enabling Use Of Dynamic Anonymization For Enhanced Security In Cloud
www.iosrjournals.org 49 | Page
3. Shifting:
Shifting shifts a field or data value by specific value. It adds some offset to data value. Shift value is
the only key to shift function, so it is kept secret. For example, an offset value 10 is added in age field.
Where t=10(shift amount for Age field).
4. Data Truncation:
It removes „n‟ least significant bits from the numerical field [13]. Even if data at the end is lost, it
preserves the information. For example, the telephone number of doctor is truncated, and only first 3 digits are
displayed as shown below.
Decimal d1=new Decimal (07167485938, 0, 0, true, 8))
This formula will display only 3 digits i.e. 071(truncated from telephone number of doctor).
5. Data Permutation:
Permutation is a substitution technique. It replaces the original value by a new unique value. The
selection of substitution value is random. These functions may result in noncollision. The formula for
permutation is,
P(n,r)= nPr
In our case, a combination of first name and last name are permuted.
6. Data Enumeration:
Enumeration is also a substitution technique. It retains the chronological order in which events takes
place. It is useful for applications demanding strict sequencing order. For example, salary field is enumerated
while maintaining the order of execution.
Here, x = Salary field
7. Ip Prefix-Preserving:
Since IP addresses are unique, it is possible to identify a person, an organization or a host. Therefore IP
address anonymization is necessary. This method preserves the n-bit prefix on IP-address. Two anonymized IP
addresses match on prefix of n-bits, if two real IP addresses match on prefix of n-bits. i.e. They share n-bit
prefix if a1a2....an=b1b2....bn and an+1 ≠ bn+1. The IP address is prefix preserved here.
Given a = a1a2 . . . an, let
F(a) := a1’a2’ . . . an’
where ai’=ai ⊕fi−1(a1, a2, . . . , ai−1)
And ⊕ is a XOR operation for i=1,2,3...n.
Here, F is prefix preserving anonymization function. It is a one-to-one function from (0,1)n
to(0,1)n
Prefix-preserving anonymization belongs to Typed Transformation, which uses single anonymized value for
each unique value of original data [14]. The tool TCPdPriv uses prefix preservation anonymization.
CryptoPAN is an approach developed by Fan et al. for creating prefix preserving anonymized addresses without
using prefix table [15].
VI. METHODOLOGY & RESULTS
Since most of the methods specified above has some drawbacks given as follows in table 1, there is a
need to implement some new methods to prevent the security breach.
Anonymization
Technique
Original
Data
Anonymized
Data
Drawbacks
Data hiding 12 0 Reduces utility of dataset
Data Shifting 9370207875 9370208874 May reverse the anonymization
process
Data Truncation 0712345345 071 Results in loss of Information
Data Permutation Swati Ganar Reetu ganar
Praful ganar
Number of original records are not
preserved
Data Enumeration 12 36 Strictly used where order of
execution is necessary
7. Enabling Use Of Dynamic Anonymization For Enhanced Security In Cloud
www.iosrjournals.org 50 | Page
Table 1: Results of Anonymization techniques
Our method works on numerical attributes. The method calculates „MOD n‟ of the numerical field and
then displays the anonymized dataset. To find a „MOD n‟ of a number, a minimum divisor „n‟ is taken from the
dataset. For example, to anonymize Age field from Hospital dataset, MOD will be calculated as below:
Age MOD n
65 MOD 20=5
Now, instead of 65, 5 will be stored in dataset, and thus Age field will be anonymized. Each time a new
entry is stored in dataset, „MOD n‟ is calculated from the Age field, based on minimum age of a Person. The
result of this anonymization is shown below:
Fig.8 Dynamic MOD
An attempt to make an Anonymization process dynamic has also be done as shown below.
Fig.9 Dynamic Anonymization
Dynamic Anonymization uses setInterval() method in which the data is anonymized after each 10
seconds. This prevents an adversary from deanonymizing the dataset.
VII. CONCLUSION AND FUTURE WORK
In spite of the safeguards in place, Cloud computing faces privacy and security concerns. Cloud
computing requires standard methodologies and technical solutions to assess privacy risks and establish
adequate protection levels. A strong protection should be ensured by organizations, agencies for private data
irrespective of the environment where the data is actually stored. Because loss of this sensitive data may create a
negative impact for organizations.
Anonymization is a viable technique to secure cloud computing. It limits the misuse of sensitive data,
but is not a complete solution to preserve confidentiality. In this paper, we surveyed few anonymization
methods and implemented some techniques of anonymization to protect sensitive data in cloud. Formal models
of security for anonymization are also discussed. Lots of techniques for anonymization have been implemented,
but still there is a fear of security breach. Research for anonymization and deanonymization is in process. The
techniques which are currently safe for anonymization may fail in future. Still data anonymization is a viable
solution that is highly recommended for security in cloud. So, the available techniques of anonymization may be
integrated to achieve better results. In future, the privacy preserving in cloud needs many efforts.
REFERENCES
[1] Jeff sedayao, “Enhancing cloud security using Data Anonymization”, Intel white paper, June 2012
[2] R. Pang, M. Allman, V. Paxson, and J. Lee, “The Devil and Packet Trace Anonymization” ACM Computer Communication Review,
36(1):29–38, January 2006.
[3] A. Slagell and W. Yurcik. Sharing Computer Network Logs for Security and Privacy: “A Motivation for New Methodologies of
Anonymization. In Proceedings of SECOVAL”: The Workshop on the Value of Security through Collaboration, pages 80–89,
September 2005
[4] Latanya Sweeney, “Achieving k-Anonymity Privacy Protection Using Generalization and Suppression”, 10 int‟l j. On uncertainty,
fuzziness & knowledge-based sys. 571, 572, 2002
[5] Information Commissioner‟s office, “Anonymization: managing data protection risk , code of practice”, 2012
[6] P. Samarati, “Protecting Respondent‟s Privacy in Microdata Release,” I6EE Trans. Knowledge and Data Eng., vol. 13, no. 6, pp. 1010-
1027, Nov./Dec. 2001.
8. Enabling Use Of Dynamic Anonymization For Enhanced Security In Cloud
www.iosrjournals.org 51 | Page
[7] L. Sweeney, “k-Anonymity: A Model for Protecting Privacy,” Int‟l J. Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 10,
no. 5, pp. 557-570, 2002.
[8] A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam, “l-diversity: Privacy beyond k-anonymity,” in ICDE, 2006, p.
24.
[9] Ninghui Li Tiancheng Li, Suresh Venkatasubramanian, “t-Closeness: Privacy Beyond k-Anonymity and l-Diversity”, 2007.
[10] Paul Ohm*, “Broken promises of privacy: responding To the surprising failure of anonymization”, 57 ucla law review 1701, 2010
[11] Ninghui Li, Member, IEEE, Tiancheng Li, and Suresh Venkatasubramanian, “Closeness: A New Privacy Measure for Data
Publishing” ,IEEE transactions on knowledge and data engineering, vol. 22, no. 7, July 2010
[12] Junqiang Liu, Ke Wang, “On Optimal Anonymization for l+-Diversity”, Data Engineering (ICDE), 2010 IEEE,pp. 213-224
[13] E. Boschi,Internet-Draft, B. Trammell, “IP Flow Anonymization Support, draft-ietf-ipfix-anon-06.txt”, 2011
[14] Scott E. Coullet al., “Playing Devil’s Advocate: Inferring Sensitive Information from Anonymized Network Traces”, NDSS,2007
[15] J. Fan, J. Xu, M. H. Ammar, and S. B. Moon. Prefix preserving IP Address Anonymization: Measurementbased Security Evaluation
and a New Cryptography-based Scheme. Computer Networks, 46(2):253–272, 2004.