Enhanced Privacy Preserving Access Control in Incremental Data using microaggregation

Indira College of Engineering Management, Pune
Enhanced Privacy Preserving Access Control in
Incremental Data using microaggregation
ME II COMPUTER
DS-II
Presented by Guide Name
Mr. Ravi Sharma Prof. Manisha Bharati

CONTENT
• Introduction
• Motivation
• Problem Statement
• Literature survey
• System Architecture
• Mathematical Model
• Algorithms
• Result Analysis
• Conclusion
• Future Scope
• References
12 June 2018 Indira College of Engineering Management, Pune 2

What is need of privacy preservation ?
• Government agencies and other organizations publish medical data,
census data for scientific, research purpose.
• The privacy protection prevents the misuse of sensitive and
confidential information of data owners.

Introduction
• Files
• Micro data
• Meta-data

Attribute types
• Identifier.
• Quasi-identifier or key attributes.
• Confidential attributes.
• Non-confidential attributes.

Microaggregation
• Order records from the initial microdata by an attribute.
• Creation of groups of consecutive values.
• Replacement of such values by the group average .
• Microaggregation for attribute Income and minimum size 3.
• The total sum for all Income values remains the same.
Rec. ID Age marital status Income
2 44 single 30,967
4 44 separated 30,967
10 45 single 30,967
1 44 married 47,500
6 45 married 47,500
7 25 separated 47,500
3 55 divorced 73,000
5 55 married 73,000
8 35 single 73,000

Sensitive Table
QI1 QI2 S1
ID Age Zip Salary
1 5 15 20K
2 15 28 30K
3 28 45 50K
4 25 60 10K
5 38 74 20K
6 32 89 50K

Data Anonymization
• The use of one or more techniques designed to make it impossible or
at least more difficult to identify a particular individual from stored
data related to them.

Non-anonymized database consisting of the Income records
Name Age Gender Country Marital status Salary
Sam 29 Female United-States Divorced 20K
Robert 24 Female Germany Never-married 50K
Jack 28 Female Mexico married 20K
sunny 27 Male United-States married 50K
Jackson 24 Female Germany
Married-civ-
spouse
10K
Suresh 23 Male United-States married 20K
Albert 19 Male Thailand married 30K
Shone 29 Male Philippines
Married-civ-
spouse
50K
Johnson 17 Male Portugal Divorced 10K
John 19 Male Canada married 20K

Applying Generalization & Suppression
Name Age Gender Country Marital status Salary
* 20 < Age ≤ 30 Female United-States * 20K
* 20 < Age ≤ 30 Female Germany * 50K
* 20 < Age ≤ 30 Female United-States * 20K
* 20 < Age ≤ 30 Male United-States * 50K
* 20 < Age ≤ 30 Female Germany * 10K
* 20 < Age ≤ 30 Male United-States * 20K
* Age ≤ 20 Male Thailand * 30K
* 20 < Age ≤ 30 Male Philippines * 50K
* Age ≤ 20 Male Portugal * 10K
* Age ≤ 20 Male Canada * 20K

Two types of attacks :
(i)Homogeneity attack and
(ii)Background knowledge attack.

t-Closeness
t-closeness effectively protects against
the sensitive attributes.
Distribution of sensitive attributes within
each quasi-identifier group should be
“close” to their Distribution in the entire
original database

• EMD(P,Q) measures the cost of transforming one distribution P into another
distribution Q by moving probability mass. EMD is computed as the
minimum transportation cost from the bins of P to the bins of Q, so it
depends on how much mass is moved and how far it is moved.
• If the numerical attribute takes values {v1, v2, ... vm}, where vi < vj if i < j,
then ordered distance(vi,vj)=(i-j)/(m-1). Now, if P and Q are
distributions over {v1, v2, ... vm} that, respectively, assign probability pi and
qi to vi , then the EMD for the ordered distance can be computed as
• EMD (P,Q) =

t-Closeness
• Earth Movers Distance (EMD)
• Work on attributes,
let ri = pi - qi , (i = 1; 2,……….,m),
EMD between P and Q can be calculated

Access Control Mechanism
• A user has access to an object based on the assigned role.
• Access Control is a set of controls to restrict access to certain
resources.

Motivation
Resource
Protection
Threat
Information Sharing
Security – Why???

Problem Statement
• A microdata set is a dataset whose carries sensitive information of
individual respondent like person or enterprise. To avoid sensitive data
access within provided set of data where sub-set of data is public and
remaining sub-set is private or protected in nature.

Objectives
• To identify the strong points of microaggregation to achieve k-
anonymous t-closeness.
• To improve the utility of the anonymized data set.
• Provide additional masking freedom and improving data utility.
• Increase data granularity
• Reducing the impact of outliers

Literature Survey
Sr .No Paper Title Aim of the Paper Advantages Disadvantages
1. t-Closeness through
Microaggregation: Strict
Privacy with Enhanced
Utility Preservation, IEEE
transactions on knowledge
and data engineering,11,
november 2015.
Microaggregation has
several advantages over
generalization/recoding
for k-anonymity that
are mostly related to
data utility preservation
increasing data
granularity and avoiding
discretization of
numerical data.
Privacy and Security
issue to anonymized
data.
2. Jianneng Cao, Panagiotis
Karras, Panos Kalnis,
Kian-Lee Tan propose a
SABRE: a Sensitive
Attribute Bucketization
and Redistribution
framework for t-
closeness, May 2009
the need
of microdata privacy
and cover the gap with
SABRE
SABRE provides the best
known resolution of the
tradeoff between privacy,
information quality, and
computational efﬁciency
with a t-closeness
guarantee in mind.
A greater number of
buckets leads to
equivalence classes
with more records
and thus, to more
information loss.

Literature Survey
Sr .No Paper Title Aim of the Paper Advantages Disadvantages
3. Josep DomingoFerrer, Hybrid
microdata using
microaggregation 10 April
2010.
method combines
microaggregation and
any synthetic
data generator.
to produce hybrid
microdata sets that can
be free with low
disclosure risk and
acceptable data utility.
preservation of means,
variances, covariances, and
third-order
and fourth-order central
moments this feature was
not open by the current
hybrid
4. Josep DomingoFerrer , Jordi
SoriaComas, From tcloseness to
differential privacy and vice
versa in data 4 anonymization, 11
November 2014.
data set
anonymization
k-anonymity, t-
closeness and e-
differential privacy
prior bucketization of
the values of the
confidential attribute
is required.
5. Ninghui Li, Tiancheng Li,
Suresh Venkatasubramanian,
tCloseness: Privacy Beyond
kAnonymity and Diversity,
2007 IEEE.
k-anonymity protects
against identity
disclosure
distribution of a sensitive
attribute in any
equivalence class is close
to the distribution of the
attribute in the overall
table
it does not provide
sufﬁcient protection
against attribute
disclosure

System Architecture
Fig: System Architecture

22
Sensitive table
Preprocessing K-Anonymity
Privacy Requirement
Imprecision
bound
Query
Predicate
Access Control
Anonymized Table
Anonymized Table
with Bound (output)
Partitioning (Top
Down Heuristic 3
Algorithm)
DatabaseAccess Data
Administrator
User
Server
Flow of Architecture

Activity Diagram

Modules
• Dataset Extraction
• Preprocessing
• Cluster Formation
• Anonymization
• Partitioning Using Heuristics Mechanism
• Database Incremental ,Anonymization And Partitioning

Mathematical Model
S: System;
A system is defined as a set such that: S = I, P, O.
Where,
U: Set of users
UR: Set of Registered Users, UN: Set of UN-Registered Users
I: Set of Input.
O: Set of output.
P: Set of Processes.
• INPUT SET DETAILS:

• 1. PHASE 1: REGISTRATION. Ir= username: ir1,
Address: ir2,
Pincode: ir3,
Country: ir4,
Gender : ir5

2. PHASE 2: Data Processing
Ie= userinfo: iv1, searchFeaturesiv2, Featurelist: iv3
3. PHASE 3: Result Id= userinfo: iv1, Feature-Key: iv2
PROCESS SET DETAILS:
1. PHASE 1: REGISTRATION.
P1= User Registration: p11
P2= feature selection: p21
Data exctraction: p22

3. PHASE 3: Result P3= view mining result: p31,
4. User Verification: p32
OUTPUT SET DETAILS:
1. PHASE 1: REGISTRATION. O1= userid: o11,
Password: o12
O2=FeatureData: o21
3. PHASE 3: Result
Success Conditions:

Algorithm
• t-closeness through microaggregation merging of
Microaggregated grouped of records.
• Algorithm 1 t-Closeness through microaggregation and merging of microaggregated groups of records.
• Data: X: original data set
• k: minimum cluster size
• t: t-closeness level
• Result Set of clusters satisfying k-anonymity and t-closeness
• X0=microaggregation(X, k)
• while EMD(X0;X) > t do
• C = cluster in X0 with the greatest EMD to X
• C0 = cluster in X0 closest to C in terms of QIs
• X0 = merge C and C0 in X0
• end while
• return X0

Algorithm
• Top-Down Heuristic Algorithm:
• Input: T,K,Q AND BQi where T for total tuples, K –cluster, Q –query, B= bound for query i
• Output: P the output partitions
• STEPS:
1. Initialize candidate partitions CP<- T
2. For all CP do the following:
a) Find the queries that overlap in that partition.
b) Select the queries with least IB and IB>0
c) Select the query with smallest bound.
d) Create query cut.
e) if(!skewed partition) then feasible cut is found and add to CP.
else
Reject the cut.
3. Return (P).

Software Requirements
• Operating System : Windows 7
• Coding Language : JAVA
• Front-End : NetBeans 8.0.2
• Data Base : My SQL 5.0
• Frame Work : JDK 1.8

Advantages of System
• Easy sharing of privacy sensitive data for analysis. Business-to-
Businesses, Entities-to-Entities and Government-to- Government.
• The anonymity technique can be used with an access control
mechanism to ensure both security and privacy of the sensitive
information.

Application
• Electronic commerce results in the automated collection of large
amounts of consumer data. These data, which are gathered by many
companies, are shared with subsidiaries and partners.
• Health care is a very sensitive sector with strict regulations. Requires
the strict regulation of protected health information for use in medical
research.

Result Analysis

Conclusion
• The access control mechanism allows only authorized query predicates
on sensitive data.
• The Proposed work use microaggregation as a method to attain k-
anonymous t-closeness.
• It’s maintain data’s in secure manner.

Future Scope
• k-means clustering parameters can be chosen dynamically through
k-means algorithm

References
[1] R. Brand, J. Domingo-Ferrer, and J. M. Mateo-Sanz. Reference data sets to test and compare SDC methods for protection of
numerical microdata. European Project IST-2000-25069 CASC [Online]. Available: http://neon.vb.cbs.nl/casc/CASCtestsets.htm,
2002..
[2] J. Cao, P. Karras, P. Kalnis, and K.-L. Tan, “SABRE: A sensitive attribute Bucketization and Redistribution framework for
tcloseness,” VLDB J., vol. 20, no. 1, pp. 59–81, 2011.
[3] D. Defays and P. Nanopoulos, “Panels of enterprises and confidentiality: The small aggregates method,” in Proc. Symp. Design
Anal. Longitudinal Surveys, 1992, pp. 195–204.
[4] J. Domingo-Ferrer and V. Torra, “A quantitative comparison of disclosure control methods for microdata,” in Confidentiality,
Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies. L. Zayatz, P. Doyle, J. Theeuwes, and J.
Lane, Eds. Amsterdam, The Netherlands: North Holland, 2001, pp. 111–134.
[5] J. Domingo-Ferrer and J. M. Mateo-Sanz, “Practical data-oriented microaggregation for statistical disclosure control,” IEEE Trans.
Knowl. Data Eng., vol. 14, no. 1, pp. 189–201, Jan./Feb. 2002.

References
[6] J. Domingo-Ferrer and U. Gonz_alez-Nicol_as, “Hybrid microdata using microaggregation,” Inf. Sci., vol. 180, no. 15,
pp. 2834–2844, 2010.
[7] J. Domingo-Ferrer, D. S_anchez and G. Rufian-Torrell, “Anonymization of nominal data based on semantic
marginality,” Inf. Sci., vol. 242, pp. 35–48, 2013.
[8] J. Domingo-Ferrer and J. Soria-Comas, “From t-closeness to differential privacy and vice versa in data
anonymization,” Knowl.- Based Syst., vol. 74, pp. 151–158, 2015.
[9] J. Domingo-Ferrer and V. Torra, “Ordinal, continuous and heterogeneous k-anonymity through microaggregation,”
Data Mining Knowl. Discovery, vol. 11, no. 2, pp. 195–212, 2005.
[10] C. Dwork, “Differential privacy,” in Proc. 33rd Int. Colloquium Automata, Languages Programm.,2006,pp.1–12

Any Questions?

THANK YOU

Enhanced Privacy Preserving Access Control in Incremental Data using microaggregation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Enhanced Privacy Preserving Access Control in Incremental Data using microaggregation

Similar to Enhanced Privacy Preserving Access Control in Incremental Data using microaggregation (20)

Recently uploaded

Recently uploaded (20)

Enhanced Privacy Preserving Access Control in Incremental Data using microaggregation