Privacy preserving in data mining with hybrid approach

Privacy Preserving in Data
Mining With Hybrid Approach
Guided By:- Presented by:-
Prof. Paresh M.Solanki Narenndra Dhadhal
M.Tech. (III) IT
14014021007 1

OUTLINE
1) Introduction PPDM
2) Need for Privacy
3) Privacy Preserving Techniques
4) Literature Survey
5) K- Anonymization
6) Proposed Work
7) References
2

Introduction
 Privacy preserving is one of the most important
research topics in the data security field and it has
become a serious concern in the secure transformation
of personal data in recent years.[1]
 A number of algorithmic techniques have been
designed for Privacy Preserving Data Mining
(PPDM).[1]
3

Introduction (cont.)
 It is used to efficiently protect individual privacy in
data sharing. [1]
 Thus, the various models have been designed for
privacy preserving data sharing. [1]
 In which various privacy preserving approaches in
data sharing and their merits and demerits are
analyzed, [1]
4

Need for Privacy[2]
 Privacy preserving data mining has become
increasingly popular because it allows sharing of
privacy sensitive data for analysis purposes.
 Suppose a hospital has some person-specific patient
data which it wants to publish.
 It wants to publish such that:
 Information remains practically useful
 Identity of an individual cannot be determined
5

Need for Privacy[4]
Non-Sensitive Data Sensitive Data
# Zip Age Nationality Name Condition
1 13053 28 Indian Kumar Heart Disease
2 13067 29 American Bob Heart Disease
3 13053 35 Canadian Ivan Viral Infection
4 13067 36 Japanese Umeko Cancer
Fig 1:- Sensitive and Non-Sensitive Data.[4]
6

Quasi Identifiers is a set of attributes that could
potentially identify a record owner when combined with
publicly available data.
Sensitive Attributes is a set of attributes that
contains sensitive person specific information such as
disease, salary etc.
 Non-Sensitive Attributes is a set of attributes that
reates no problem if revealed even to untrustworthy
parties.
7
Need for Privacy[5]

Need for Privacy[4]
Non-Sensitive Data Sensitive Data
# Zip Age Nationality Condition
1 13053 28 Indian Heart Disease
2 13067 29 American Heart Disease
3 13053 35 Canadian Viral Infection
4 13067 36 Japanese Cancer
# Name Zip Age Nationality
1 John 13053 28 American
2 Bob 13067 29 American
3 Chris 13053 23 American
Published
Data
Data leak!
Fig 2:- Sensitive and Non-Sensitive Data Leak.[4]
8

Privacy Preserving Techniques
 The Important Techniques of Privacy Preserving
Data Mining are: [3]
1)The randomization method
2)The encryption method
3)The Anonymization method
9

1. The Randomization Method [3]
 Randomization method is an important and popular
method in current privacy preserving data mining
techniques.
 It masks the values of the records by adding additional
data to the original data.
10

2. The Encryption Method [3]
 Encryption method mainly resolves the problems that
people jointly conduct mining tasks based on the
private inputs they provide.
 These privacy mining tasks could occur between
mutual un-trusted parties, or even between competitors.
 Therefore, to protect the privacy becomes an important
concern in distributed data mining setting.
11

3. The Anonymization Method [3]
 Anonymization method is aimed at making the
individual record will be indistinguishable among a
group record by using generalization and suppression
techniques.
 K-Anonymity is the representative anonymization
method.
12

Literature Survey[1]
Privacy Preserving Data Mining Techniques-Survey
Author Ms. Dhanalakshmi.M, Mrs.Siva Sankari, (2014)
Summary In this paper the models of privacy preserving will be discussed
.Trust Third Party Model, Semi-honest Model, Malicious Model,
Other Models-Incentive Compatibility. Also discuss the survey
of privacy preserving techniques such as Randomization method,
Anonymization method and Encryption method.
Issues/Challen
ges
The personalized privacy preservation will become the issue.
13

A Survey on Privacy Preserving Data Mining
Author K.Saranya, K.Premalatha, S.S.Rajasekar, (2015)
Summary This paper presents a brief survey on various standard
techniques for privacy preserving data mining was presented
namely: Classification, Clustering and Associated rule
mining.
Issues/Challen
ges
The merits and demerits of different techniques were pointed
out. In future, propose a hybrid approach of all these
techniques.
14

A Survey on Privacy Preserving Data Mining
Author Jian Wang , Yongcheng Luo, Yan Zhao, Jiajin Le, (2009)
Summary This paper intends to reiterate several privacy preserving data
mining technologies clearly and then proceeds to analyze the
merits and shortcomings of these technologies.
Issues/Challeng
es
Limitations of the k-anonymity model stem from the two
assumptions. First, it may be very hard for the owner of a
database to determine which of the attributes are or are not
available in external tables. The second limitation is that the k-
anonymity model assumes a certain method of attack, while in
real scenarios there is no reason why the attacker should not try
other methods.
15

A Survey on Anonymity-based Privacy Preserving
Author Jian Wang, Yongcheng Luo, Shuo Jiang, Jiajin Le, (2009)
Summary In this paper author firstly shown that a k-anonymity dataset
permits strong attacks due to lack of diversity in the sensitive
attributes.
Issues/Challeng
es
k-anonymity protects against identity disclosure, it does not
provide sufficient protection against attribute disclosure.
16

Analysis of Privacy Preserving K-Anonymity Methods and Techniques
Author S.Vijayarani, A.Tamilarasi, M.Sampoorna, (2010)
Summary This paper present a survey of recent approaches that have
been applied to the k-Anonymity problem. Two main
techniques have been proposed for enforcing k-anonymity on a
private table: namely generalization and Suppression.
Issues/Challeng
es
Threats to k-anonymity that can arise from performing mining
on a collection of data and the approaches to combine k-
anonymity in data mining.
17

Privacy Preserving in Data Mining Using Hybrid Approach
Author Savita Lohiya, Lata Ragha, (2012)
Summary This paper propose a method called Hybrid approach for
privacy preserving. First randomizing the original data. Then
apply generalization on randomized or modified data. This
technique protect private data with better accuracy, also it can
reconstruct original data and provide data with no information
loss, makes usability of data.
Issues/Challeng
es
K-anonymity method has shortcoming of homogeneity and
background attack.
18

K- Anonymization
 Data anonymization is a type of information
sanitization whose intent is privacy protection.[6]
 It is the process of either encrypting or removing
personally identifiable information from data sets,
so that the people whom the data describe remain
anonymous.[6]
 For example, a hospital may release patients
records so that researchers can study the
characteristics of various diseases.[6]
19

K- Anonymization
 There are two common methods for achieving k-
anonymity for some value of k.[3]
 Suppression: In this method, certain values of the
attributes are replaced by an asterisk '*'. All or some
values of a column may be replaced by '*'. [3]
 Generalization: In this method, individual values of
attributes are replaced by with a broader category. For
example, the value ‘33' of the attribute 'Age' may be
replaced by ' < 40', the value '24' by '20 < Age ≤ 30' ,
etc.[3]
20

# Zip Age Nationality Condition
1 130** < 40 * Heart Disease
2 130** < 40 * Heart Disease
3 130** < 40 * Viral Infection
4 130** < 40 * Cancer
Generalization
Suppression (cell-level)
K- Anonymization(cont…)
Fig 3:- Generalization and Suppression.[2] 21

ID Attributes
Age Sex Zip Code Disease
1 26 M 83661 Headache
2 24 M 83634 Headache
3 31 M 83967 Viral Infection
4 39 F 83949 Cough
ID Attributes
Name Age Sex Zip Code
1 Jim 26 M 83661
2 Jay 24 M 83634
3 Tom 31 M 83967
4 Lily 39 F 83949
TABLE I. MICRODATA
TABLE II. VOTER REGISTRATION
LIST
K- Anonymization(cont…)[4]
22

1) Key attributes: [5]
Name, address, phone number - uniquely identifying!
Always removed before release.
2) Quasi-identifiers: [5]
It is a set of features whose associated values may be useful
for linking with another data set to re-identify the entity
that is the subject of the data.
(5-digit ZIP code, birth date, gender) uniquely identify
Classification of Attributes
23

ID Attributes
Age Sex Zip Code Disease
1 2* M 836** Headache
2 2* M 836** Headache
3 3* * 839** Viral
Infection
4 3* * 839** Cough
TABLE III. 2-ANONYMOUS TABLE
K- Anonymization(cont…)[4]
24

K- Anonymization[3]
 In general, k-anonymity guarantees that an individual can
be associated with his real tuple with a probability at most
1/k.
 While k-anonymity protects against identity disclosure, it
does not provide sufficient protection against attribute
disclosure.
 Two attacks were identified : the homogeneity attack and
the background knowledge attack.
25

 Suppose Jay knows that Jim was 26 year old man and
his zip code is 83661. So he conclude that Jim
corresponds to the first equivalence class, and thus
must have headache. This is the homogeneity attack.
 Suppose that, by knowing Lily's age and zip code, Jay
can conclude that Lily corresponds to a record in the
last equivalence class. Furthermore, suppose that Jay
knows that Lily has very low risk for viral infection.
This background knowledge enables Jay to conclude
that Lily most likely has cough
K- Anonymization[6]
26

 In today’s world, privacy is the major concern to
protect the sensitive data. People are very much
concerned about their sensitive information which they
don’t want to share.
 The proposed method as we combined K-anonymity
with perturbation technique.
Proposed work[5]
27

References
[1] Dhanalakshmi, M., and E. Siva Sankari. "Privacy
preserving data mining techniques-
survey."Information Communication and
Embedded Systems (ICICES), 2014 International
Conference on. IEEE, 2014.
[2] K.Saranya, K.Premalatha, S.S.Rajasekar, . " A
Survey on Privacy Preserving Data Mining."
International Journal of Innovations & Advancement
in Computer Science 2015,IEEE,2015.
28

[3] Wang, Jian, et al. "A survey on privacy preserving
data mining." Database Technology and
Applications, 2009 First International Workshop on.
IEEE, 2009.
[4] Wang, Jian, et al. "A survey on anonymity-based
privacy preserving." E-Business and Information
System Security, 2009. EBISS'09. International
Conference on. IEEE, 2009.
References (cont.)
29

References (cont.)
[5] Vijayarani, S., A. Tamilarasi, and M. Sampoorna.
"Analysis of privacy preserving k-anonymity
methods and techniques." Communication and
Computational Intelligence (INCOCCI), 2010
International Conference on. IEEE, 2010.
[6] Lohiya, Savita, and Lata Ragha. "Privacy Preserving
in Data Mining Using Hybrid
Approach."Computational Intelligence and
Communication Networks (CICN), 2012 Fourth
International Conference on. IEEE, 2012. 30

Privacy preserving in data mining with hybrid approach

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (16)

Similar to Privacy preserving in data mining with hybrid approach

Similar to Privacy preserving in data mining with hybrid approach (20)

Recently uploaded

Recently uploaded (20)

Privacy preserving in data mining with hybrid approach