Privacy Preserving Data Mining

PPDM
Romalee
Amolic
Introduction
Literature
Survey
Methodology
Used
Algorithms
used
Advantages
and Disad-
vantages
Conclusion
Future
Scope
References
PRIVACY PRESERVING
DATA MINING(PPDM)
Romalee Amolic
AISSMS College of Engineering
Department of Computer Engineering
March 23, 2017

PPDM
Romalee
Amolic
Introduction
Literature
Survey
Methodology
Used
Algorithms
used
Advantages
and Disad-
vantages
Conclusion
Future
Scope
References
Outline
1 Introduction
2 Literature Survey
3 Methodology Used
4 Algorithms used
5 Advantages and Disadvantages
6 Conclusion
7 Future Scope
8 References

PPDM
Romalee
Amolic
Introduction
Literature
Survey
Methodology
Used
Algorithms
used
Advantages
and Disad-
vantages
Conclusion
Future
Scope
References
Introduction
SOCIAL ISSUE:
Big data is mined so as to ﬁnd out interesting patterns.Every
person involved,is concerned about the leakage of private
data i.e privacy of the individual’s data.
MOTIVATION:
Today privacy of data is one of the most serious concerns
which people face on an individual as well as organisational
level and it has to be dealt with in an eﬀective
manner.Hence,I chose this topic as privacy is the most
important need of the hour.

PPDM
Romalee
Amolic
Introduction
Literature
Survey
Methodology
Used
Algorithms
used
Advantages
and Disad-
vantages
Conclusion
Future
Scope
References
Introduction
PROBLEM STATEMENT:
We should make a comparative study of the available data
mining and preserving techniques so as to learn and derive
better methods to preservethe privacy of data.
OBJECTIVES AND OUTCOMES:
By using PPDM we can achieve the objectives of preserving
the privacy of data as per the requirements of the user.By
using various algorithms we can mine important rules from
the database while still preserving the privacy of the data.

PPDM
Romalee
Amolic
Introduction
Literature
Survey
Methodology
Used
Algorithms
used
Advantages
and Disad-
vantages
Conclusion
Future
Scope
References
Introduction of Proposed System
Data mining takes place at various levels. The three entities
can be categorisedas :
Data Provider is the one who provides the data.
Concern:Whether he can control the sensitivity of the data
he provides to others.
Data collector is the user who collects data from data
providers and then publish the data to the data miner.
Concern:To guarantee that the modiﬁed data contain no
sensitive information but still preserve high utility.
Data Miner is the user who performs data mining tasks on
the data.Concern:How to prevent sensitive information from
appearing in the mining results

PPDM
Romalee
Amolic
Introduction
Literature
Survey
Methodology
Used
Algorithms
used
Advantages
and Disad-
vantages
Conclusion
Future
Scope
References
Literature Survey:
The randomization method: The randomization method is a
technique for privacy-preserving data mining in which noise is
added to the data in order to mask the attribute values of records.
The k-anonymity model and l-diversity: In the k-anonymity
method, we reduce the granularity of data representation with the
use of techniques such as generalization and suppression.[4]
Association rule mining can probe to be the best method to
preserve the privacy[2].This is one of the latest technologies and
methods.It tries to eliminate the ﬂaws if any in the previous
methods.
Based on in-depth study of the existing data mining and
association rule mining algorithms, a new mining algorithm of
weighted association rules can be proposed.It greatly reduces
the time of input and output, and improves the eﬃciency of data
mining.

PPDM
Romalee
Amolic
Introduction
Literature
Survey
Methodology
Used
Algorithms
used
Advantages
and Disad-
vantages
Conclusion
Future
Scope
References
In addition to that fuzzy association rules have been developed
so as alter the support and confidence of rules as per the
requirements.
Data mining can be done at various stages[3].This paper tries to
explore various PPDM techniques based on proposed PPDM
classification hierarchy.
The categorization of data mining can be done into:
a>Centralized and
b>Distributed data mining
In addition to the normal methods of anonymization and
associative rule mining the methods of perturbation and
cryptography are discussed in detail.[3]
In order to deal with these issues there might be balance between
the privacy and utility of the data.This is the most important
reason for the large research and development in this field.[1]

PPDM
Romalee
Amolic
Introduction
Literature
Survey
Methodology
Used
Algorithms
used
Advantages
and Disad-
vantages
Conclusion
Future
Scope
References
Work Signiﬁcance
randomization Noise is added to the data
k-anonymity Prevention of record linkage
l-diversity Prevention of record and attribute linkage
Association rule
mining Finding out sensitive rules
Fuzzy association
rules Alter the support and conﬁdence of rules.
Perturbation Individuals can choose privacy level
Cryptography Hiding information using cryptography.

PPDM
Romalee
Amolic
Introduction
Literature
Survey
Methodology
Used
Algorithms
used
Advantages
and Disad-
vantages
Conclusion
Future
Scope
References
DATA PROVIDER
Refuse to provide
such data.

PPDM
Romalee
Amolic
Introduction
Literature
Survey
Methodology
Used
Algorithms
used
Advantages
and Disad-
vantages
Conclusion
Future
Scope
References
DATA PROVIDER
Refuse to provide
such data.
Negotiate with the
data collector

PPDM
Romalee
Amolic
Introduction
Literature
Survey
Methodology
Used
Algorithms
used
Advantages
and Disad-
vantages
Conclusion
Future
Scope
References
DATA PROVIDER
Refuse to provide
such data.
Negotiate with the
data collector
Limit the access

PPDM
Romalee
Amolic
Introduction
Literature
Survey
Methodology
Used
Algorithms
used
Advantages
and Disad-
vantages
Conclusion
Future
Scope
References
DATA PROVIDER
Refuse to provide
such data.
Negotiate with the
data collector
Limit the access
HOW?

PPDM
Romalee
Amolic
Introduction
Literature
Survey
Methodology
Used
Algorithms
used
Advantages
and Disad-
vantages
Conclusion
Future
Scope
References
How to limit the access?
LIMIT
ACCESS
Anti-
tracking
extensions
Disconnect
Do Not
Track Me
Ghostery
Do Not
Track
(DNT)
Encryption
tools
MailCloakTorChat
Advertisement
and script
blockers
AdBlock
Plus
FlashBlock
NoScript

PPDM
Romalee
Amolic
Introduction
Literature
Survey
Methodology
Used
Algorithms
used
Advantages
and Disad-
vantages
Conclusion
Future
Scope
References
DATA COLLECTOR
The methods of PPDM are used for preserving the data of
the data collector.We shall consider k-anonymity.
Four types of attributes:
*Identiﬁer (ID)
*Quasi-identiﬁer (QID)
*Sensitive Attribute (SA)
*Non-sensitive Attribute (NSA)

PPDM
Romalee
Amolic
Introduction
Literature
Survey
Methodology
Used
Algorithms
used
Advantages
and Disad-
vantages
Conclusion
Future
Scope
References
DATA COLLECTOR
k-anonymity: Every tuple in the anonymized table is
indistinguishable from at least k-1 other tuples along the
quasi-identiﬁers
k-anonymity can be implemented as:
a.Generalization.
b.Suppression.

PPDM
Romalee
Amolic
Introduction
Literature
Survey
Methodology
Used
Algorithms
used
Advantages
and Disad-
vantages
Conclusion
Future
Scope
References
k-anonymity methods

PPDM
Romalee
Amolic
Introduction
Literature
Survey
Methodology
Used
Algorithms
used
Advantages
and Disad-
vantages
Conclusion
Future
Scope
References
DATA MINER
The data miner makes use of the fuzzy association rule
mining for data preservation.
Association rule mining aims at ﬁnding interesting
associations among large sets of data items.
Steps involve:
1]Finding frequent item sets
2]Generate strong association rules
Thereafter the fuzzy association rule mining technique can
be used to alterthe level of
a)Support and
b)Conﬁdence

PPDM
Romalee
Amolic
Introduction
Literature
Survey
Methodology
Used
Algorithms
used
Advantages
and Disad-
vantages
Conclusion
Future
Scope
References
Fuzzy association rule mining
Terms used:
R->Interesting rules
Rh->Sensitive rules
D->Dataset
MST->Mininum Support Threshhold
MCT->Minimum Conﬁdence Threshold

PPDM
Romalee
Amolic
Introduction
Literature
Survey
Methodology
Used
Algorithms
used
Advantages
and Disad-
vantages
Conclusion
Future
Scope
References
Steps can be summarised as follows:
Step 1:Fuzziﬁcation
Step 2:Fuzzy Inferencing
Step 3:Aggregation of all outputs
Step 4:Defuzziﬁcation
A association rule is of the form: x → y
Association
rule
Antecedent Consequence

PPDM
Romalee
Amolic
Introduction
Literature
Survey
Methodology
Used
Algorithms
used
Advantages
and Disad-
vantages
Conclusion
Future
Scope
References
Step 1:Fuzziﬁcation
Fuzzy statements in the antecedent are resolved to a degree
of membership between 0 and 1.

PPDM
Romalee
Amolic
Introduction
Literature
Survey
Methodology
Used
Algorithms
used
Advantages
and Disad-
vantages
Conclusion
Future
Scope
References
Step 2.Fuzzy Inferencing(Implication Method):
Truth value for the antecedent of each rule is computed and
applied to the conclusion part of each rule.Degree of support
is used.
If the antecedent is only partially true, (i.e., is assigned a
value less than 1), then the output fuzzy set is truncated
according to the implication method. If the consequent of
a rule has multiple parts, then all consequents are aﬀected
equally by the result of the antecedent. min: truncates the
consequent’s membership function

PPDM
Romalee
Amolic
Introduction
Literature
Survey
Methodology
Used
Algorithms
used
Advantages
and Disad-
vantages
Conclusion
Future
Scope
References

PPDM
Romalee
Amolic
Introduction
Literature
Survey
Methodology
Used
Algorithms
used
Advantages
and Disad-
vantages
Conclusion
Future
Scope
References
Step 3.Aggregation:
It is the process where the outputs of each rule are combined
into a single fuzzy set.
The input? List of truncated output functions returned by
the implication process for each rule.
The output of the aggregation process is one fuzzy set for
each rule.
Here, all fuzzy sets assigned to each output variable are
combined together to form a single fuzzy set.

PPDM
Romalee
Amolic
Introduction
Literature
Survey
Methodology
Used
Algorithms
used
Advantages
and Disad-
vantages
Conclusion
Future
Scope
References
Step 4.Deffuzzification:
In Defuzzification, the fuzzy output set is converted to a
crisp number. Some commonly used techniques are the
centroid and maximum methods.

PPDM
Romalee
Amolic
Introduction
Literature
Survey
Methodology
Used
Algorithms
used
Advantages
and Disad-
vantages
Conclusion
Future
Scope
References
Advantages and Disadvantages
ADVANTAGES:
Protection of personal
information.
Development of KDD.
Sharing of sensitive data
for analysis.
DISADVANTAGES:
Balance between utility
of the data and privacy.
Loss of data.
Availability of personal
data.[7]

PPDM
Romalee
Amolic
Introduction
Literature
Survey
Methodology
Used
Algorithms
used
Advantages
and Disad-
vantages
Conclusion
Future
Scope
References
Conclusion
PPDM techniques can be used to make the data private and
secure.These ensure eﬃcient privacy preserving of data. The
use of existing algorithms works towards the direction to
reduce the impact of PPDM on the source database.

PPDM
Romalee
Amolic
Introduction
Literature
Survey
Methodology
Used
Algorithms
used
Advantages
and Disad-
vantages
Conclusion
Future
Scope
References
Future Scope
Since, no such technique exists which overcomes all privacy
issues, research in this direction can make signiﬁcant
contributions. The study can be carried out using any one
of the existing techniques or using a combination of these or
by developing entirely a new technique. Since, no such
technique exists which overcomes all privacy issues, research
in this direction can make signiﬁcant contributions.
The study can be carried out using any one of the existing
techniques or using a combination of these or by developing
entirely a new technique.
Convex optimization method can be extended for any kind
of association rules.

PPDM
Romalee
Amolic
Introduction
Literature
Survey
Methodology
Used
Algorithms
used
Advantages
and Disad-
vantages
Conclusion
Future
Scope
References
References
[1] LEI XU, CHUNXIAO JIANG, (Member, IEEE), JIAN
WANG, (Member, IEEE), JIAN YUAN, (Member, IEEE), AND
YONG REN, (Member, IEEE) ”Information Security in Big
Data:Privacy and Data Mining”
[2] Lei Chen,’The Research of Data Mining Algorithm Based on
Association Rules’
[3] Jisha Jose Panackal1 and Dr Anitha S Pillai2 Associate
Professor Department of Computer Applications, Vidya Academy
of Science and Technology, Thrissur, Kerala, India ’Privacy
Preserving Data Mining: An Extensive Survey’
[4] Charu C. Aggarwal IBM T. J. Watson Research Center
Hawthorne, NY 10532 Philip S. Yu University of Illinois at
Chicago ’A General Survey of Privacy-Preserving Data
Mining:Models and Algorithms’
[5] D. Jain, P. Khatri, R. Soni, and B. K. Chaurasia, ’Hiding
sensitive association rules without altering the support of sensitive
item’.

PPDM
Romalee
Amolic
Introduction
Literature
Survey
Methodology
Used
Algorithms
used
Advantages
and Disad-
vantages
Conclusion
Future
Scope
References
[6] J.-M. Zhu, N. Zhang, and Z.-Y. Li, ’A new privacy
preserving association rule mining algorithm based on hybrid
partial hiding strategy’ Cybern. Inf. Technol., vol. 13, pp.
4150, Dec. 2013.
[7] Privacy Preserving Quantitative Association Rule Mining
Using Convex Optimization Technique’
[8] International Journal of Advanced Research in Computer
and Communication Engineering Vol. 2, Issue 4, April 2013
Copyright to IJARCCE ww.ijarcce.com 1677 ’Privacy
Preserving Data Mining’ Seema Kedar, Sneha Dhawale,
Wankhade Vaibhav
[9] http://donottrack.us/
[10] http://webpages.uncc.edu/xwu/career/
[11] http://www.intechopen.com/

Privacy Preserving Data Mining

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (19)

Similar to Privacy Preserving Data Mining

Similar to Privacy Preserving Data Mining (20)

Recently uploaded

Recently uploaded (20)

Privacy Preserving Data Mining