The use of one or more techniques designed to make it impossible or at least more difficult to identify a particular individual from stored data related to them.
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
Enhanced Privacy Preserving Access Control in Incremental Data using microaggregation
1. Indira College of Engineering Management, Pune
Enhanced Privacy Preserving Access Control in
Incremental Data using microaggregation
ME II COMPUTER
DS-II
Presented by Guide Name
Mr. Ravi Sharma Prof. Manisha Bharati
2. CONTENT
• Introduction
• Motivation
• Problem Statement
• Literature survey
• System Architecture
• Mathematical Model
• Algorithms
• Result Analysis
• Conclusion
• Future Scope
• References
12 June 2018 Indira College of Engineering Management, Pune 2
3. What is need of privacy preservation ?
• Government agencies and other organizations publish medical data,
census data for scientific, research purpose.
• The privacy protection prevents the misuse of sensitive and
confidential information of data owners.
12 June 2018 Indira College of Engineering Management, Pune 3
5. Attribute types
• Identifier.
• Quasi-identifier or key attributes.
• Confidential attributes.
• Non-confidential attributes.
12 June 2018 Indira College of Engineering Management, Pune 5
6. Microaggregation
• Order records from the initial microdata by an attribute.
• Creation of groups of consecutive values.
• Replacement of such values by the group average .
• Microaggregation for attribute Income and minimum size 3.
• The total sum for all Income values remains the same.
12 June 2018 Indira College of Engineering Management, Pune 6
Rec. ID Age marital status Income
2 44 single 30,967
4 44 separated 30,967
10 45 single 30,967
1 44 married 47,500
6 45 married 47,500
7 25 separated 47,500
3 55 divorced 73,000
5 55 married 73,000
8 35 single 73,000
7. Sensitive Table
QI1 QI2 S1
ID Age Zip Salary
1 5 15 20K
2 15 28 30K
3 28 45 50K
4 25 60 10K
5 38 74 20K
6 32 89 50K
12 June 2018 Indira College of Engineering Management, Pune 7
8. Data Anonymization
• The use of one or more techniques designed to make it impossible or
at least more difficult to identify a particular individual from stored
data related to them.
12 June 2018 Indira College of Engineering Management, Pune 8
9. Non-anonymized database consisting of the Income records
12 June 2018 Indira College of Engineering Management, Pune 9
Name Age Gender Country Marital status Salary
Sam 29 Female United-States Divorced 20K
Robert 24 Female Germany Never-married 50K
Jack 28 Female Mexico married 20K
sunny 27 Male United-States married 50K
Jackson 24 Female Germany
Married-civ-
spouse
10K
Suresh 23 Male United-States married 20K
Albert 19 Male Thailand married 30K
Shone 29 Male Philippines
Married-civ-
spouse
50K
Johnson 17 Male Portugal Divorced 10K
John 19 Male Canada married 20K
10. Applying Generalization & Suppression
12 June 2018 Indira College of Engineering Management, Pune 10
Name Age Gender Country Marital status Salary
* 20 < Age ≤ 30 Female United-States * 20K
* 20 < Age ≤ 30 Female Germany * 50K
* 20 < Age ≤ 30 Female United-States * 20K
* 20 < Age ≤ 30 Male United-States * 50K
* 20 < Age ≤ 30 Female Germany * 10K
* 20 < Age ≤ 30 Male United-States * 20K
* Age ≤ 20 Male Thailand * 30K
* 20 < Age ≤ 30 Male Philippines * 50K
* Age ≤ 20 Male Portugal * 10K
* Age ≤ 20 Male Canada * 20K
11. Two types of attacks :
(i)Homogeneity attack and
(ii)Background knowledge attack.
12 June 2018 Indira College of Engineering Management, Pune 11
12. t-Closeness
12 June 2018 Indira College of Engineering Management, Pune 12
t-closeness effectively protects against
the sensitive attributes.
Distribution of sensitive attributes within
each quasi-identifier group should be
“close” to their Distribution in the entire
original database
13. • EMD(P,Q) measures the cost of transforming one distribution P into another
distribution Q by moving probability mass. EMD is computed as the
minimum transportation cost from the bins of P to the bins of Q, so it
depends on how much mass is moved and how far it is moved.
• If the numerical attribute takes values {v1, v2, ... vm}, where vi < vj if i < j,
then ordered distance(vi,vj)=(i-j)/(m-1). Now, if P and Q are
distributions over {v1, v2, ... vm} that, respectively, assign probability pi and
qi to vi , then the EMD for the ordered distance can be computed as
• EMD (P,Q) =
12 June 2018 Indira College of Engineering Management, Pune 13
14. t-Closeness
12 June 2018 Indira College of Engineering Management, Pune 14
• Earth Movers Distance (EMD)
• Work on attributes,
let ri = pi - qi , (i = 1; 2,……….,m),
EMD between P and Q can be calculated
15. Access Control Mechanism
• A user has access to an object based on the assigned role.
• Access Control is a set of controls to restrict access to certain
resources.
12 June 2018 Indira College of Engineering Management, Pune 15
16. Motivation
12 June 2018 Indira College of Engineering Management, Pune 16
Resource
Protection
Threat
Information Sharing
Security – Why???
17. Problem Statement
• A microdata set is a dataset whose carries sensitive information of
individual respondent like person or enterprise. To avoid sensitive data
access within provided set of data where sub-set of data is public and
remaining sub-set is private or protected in nature.
12 June 2018 Indira College of Engineering Management, Pune 17
18. Objectives
• To identify the strong points of microaggregation to achieve k-
anonymous t-closeness.
• To improve the utility of the anonymized data set.
• Provide additional masking freedom and improving data utility.
• Increase data granularity
• Reducing the impact of outliers
12 June 2018 Indira College of Engineering Management, Pune 18
19. Literature Survey
Sr .No Paper Title Aim of the Paper Advantages Disadvantages
1. t-Closeness through
Microaggregation: Strict
Privacy with Enhanced
Utility Preservation, IEEE
transactions on knowledge
and data engineering,11,
november 2015.
Microaggregation has
several advantages over
generalization/recoding
for k-anonymity that
are mostly related to
data utility preservation
increasing data
granularity and avoiding
discretization of
numerical data.
Privacy and Security
issue to anonymized
data.
2. Jianneng Cao, Panagiotis
Karras, Panos Kalnis,
Kian-Lee Tan propose a
SABRE: a Sensitive
Attribute Bucketization
and Redistribution
framework for t-
closeness, May 2009
the need
of microdata privacy
and cover the gap with
SABRE
SABRE provides the best
known resolution of the
tradeoff between privacy,
information quality, and
computational efficiency
with a t-closeness
guarantee in mind.
A greater number of
buckets leads to
equivalence classes
with more records
and thus, to more
information loss.
20. Literature Survey
Sr .No Paper Title Aim of the Paper Advantages Disadvantages
3. Josep DomingoFerrer, Hybrid
microdata using
microaggregation 10 April
2010.
method combines
microaggregation and
any synthetic
data generator.
to produce hybrid
microdata sets that can
be free with low
disclosure risk and
acceptable data utility.
preservation of means,
variances, covariances, and
third-order
and fourth-order central
moments this feature was
not open by the current
hybrid
4. Josep DomingoFerrer , Jordi
SoriaComas, From tcloseness to
differential privacy and vice
versa in data 4 anonymization, 11
November 2014.
data set
anonymization
k-anonymity, t-
closeness and e-
differential privacy
prior bucketization of
the values of the
confidential attribute
is required.
5. Ninghui Li, Tiancheng Li,
Suresh Venkatasubramanian,
tCloseness: Privacy Beyond
kAnonymity and Diversity,
2007 IEEE.
k-anonymity protects
against identity
disclosure
distribution of a sensitive
attribute in any
equivalence class is close
to the distribution of the
attribute in the overall
table
it does not provide
sufficient protection
against attribute
disclosure
22. 22
Sensitive table
Preprocessing K-Anonymity
Privacy Requirement
Imprecision
bound
Query
Predicate
Access Control
Anonymized Table
Anonymized Table
with Bound (output)
Partitioning (Top
Down Heuristic 3
Algorithm)
DatabaseAccess Data
Administrator
User
Server
Flow of Architecture
24. Modules
• Dataset Extraction
• Preprocessing
• Cluster Formation
• Anonymization
• Partitioning Using Heuristics Mechanism
• Database Incremental ,Anonymization And Partitioning
12 June 2018 Indira College of Engineering Management, Pune 24
25. Mathematical Model
S: System;
A system is defined as a set such that: S = I, P, O.
Where,
U: Set of users
UR: Set of Registered Users, UN: Set of UN-Registered Users
I: Set of Input.
O: Set of output.
P: Set of Processes.
• INPUT SET DETAILS:
12 June 2018 Indira College of Engineering Management, Pune 25
26. • 1. PHASE 1: REGISTRATION. Ir= username: ir1,
Address: ir2,
Pincode: ir3,
Country: ir4,
Gender : ir5
12 June 2018 Indira College of Engineering Management, Pune 26
27. 2. PHASE 2: Data Processing
Ie= userinfo: iv1, searchFeaturesiv2, Featurelist: iv3
3. PHASE 3: Result Id= userinfo: iv1, Feature-Key: iv2
PROCESS SET DETAILS:
1. PHASE 1: REGISTRATION.
P1= User Registration: p11
2. PHASE 2: Data Processing
P2= feature selection: p21
Data exctraction: p22
12 June 2018 Indira College of Engineering Management, Pune 27
28. 3. PHASE 3: Result P3= view mining result: p31,
4. User Verification: p32
OUTPUT SET DETAILS:
1. PHASE 1: REGISTRATION. O1= userid: o11,
Password: o12
2. PHASE 2: Data Processing
O2=FeatureData: o21
3. PHASE 3: Result
Success Conditions:
12 June 2018 Indira College of Engineering Management, Pune 28
29. Algorithm
• t-closeness through microaggregation merging of
Microaggregated grouped of records.
• Algorithm 1 t-Closeness through microaggregation and merging of microaggregated groups of records.
• Data: X: original data set
• k: minimum cluster size
• t: t-closeness level
• Result Set of clusters satisfying k-anonymity and t-closeness
• X0=microaggregation(X, k)
• while EMD(X0;X) > t do
• C = cluster in X0 with the greatest EMD to X
• C0 = cluster in X0 closest to C in terms of QIs
• X0 = merge C and C0 in X0
• end while
• return X0
12 June 2018 Indira College of Engineering Management, Pune 29
30. Algorithm
12 June 2018 Indira College of Engineering Management, Pune 30
• Top-Down Heuristic Algorithm:
• Input: T,K,Q AND BQi where T for total tuples, K –cluster, Q –query, B= bound for query i
• Output: P the output partitions
• STEPS:
1. Initialize candidate partitions CP<- T
2. For all CP do the following:
a) Find the queries that overlap in that partition.
b) Select the queries with least IB and IB>0
c) Select the query with smallest bound.
d) Create query cut.
e) if(!skewed partition) then feasible cut is found and add to CP.
else
Reject the cut.
3. Return (P).
31. Software Requirements
• Operating System : Windows 7
• Coding Language : JAVA
• Front-End : NetBeans 8.0.2
• Data Base : My SQL 5.0
• Frame Work : JDK 1.8
12 June 2018 Indira College of Engineering Management, Pune 31
32. Advantages of System
• Easy sharing of privacy sensitive data for analysis. Business-to-
Businesses, Entities-to-Entities and Government-to- Government.
• The anonymity technique can be used with an access control
mechanism to ensure both security and privacy of the sensitive
information.
12 June 2018 Indira College of Engineering Management, Pune 32
33. Application
• Electronic commerce results in the automated collection of large
amounts of consumer data. These data, which are gathered by many
companies, are shared with subsidiaries and partners.
• Health care is a very sensitive sector with strict regulations. Requires
the strict regulation of protected health information for use in medical
research.
12 June 2018 Indira College of Engineering Management, Pune 33
36. Conclusion
• The access control mechanism allows only authorized query predicates
on sensitive data.
• The Proposed work use microaggregation as a method to attain k-
anonymous t-closeness.
• It’s maintain data’s in secure manner.
12 June 2018 Indira College of Engineering Management, Pune 36
37. Future Scope
• k-means clustering parameters can be chosen dynamically through
k-means algorithm
12 June 2018 Indira College of Engineering Management, Pune 37
38. References
[1] R. Brand, J. Domingo-Ferrer, and J. M. Mateo-Sanz. Reference data sets to test and compare SDC methods for protection of
numerical microdata. European Project IST-2000-25069 CASC [Online]. Available: http://neon.vb.cbs.nl/casc/CASCtestsets.htm,
2002..
[2] J. Cao, P. Karras, P. Kalnis, and K.-L. Tan, “SABRE: A sensitive attribute Bucketization and Redistribution framework for
tcloseness,” VLDB J., vol. 20, no. 1, pp. 59–81, 2011.
[3] D. Defays and P. Nanopoulos, “Panels of enterprises and confidentiality: The small aggregates method,” in Proc. Symp. Design
Anal. Longitudinal Surveys, 1992, pp. 195–204.
[4] J. Domingo-Ferrer and V. Torra, “A quantitative comparison of disclosure control methods for microdata,” in Confidentiality,
Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies. L. Zayatz, P. Doyle, J. Theeuwes, and J.
Lane, Eds. Amsterdam, The Netherlands: North Holland, 2001, pp. 111–134.
[5] J. Domingo-Ferrer and J. M. Mateo-Sanz, “Practical data-oriented microaggregation for statistical disclosure control,” IEEE Trans.
Knowl. Data Eng., vol. 14, no. 1, pp. 189–201, Jan./Feb. 2002.
12 June 2018 Indira College of Engineering Management, Pune 38
39. References
[6] J. Domingo-Ferrer and U. Gonz_alez-Nicol_as, “Hybrid microdata using microaggregation,” Inf. Sci., vol. 180, no. 15,
pp. 2834–2844, 2010.
[7] J. Domingo-Ferrer, D. S_anchez and G. Rufian-Torrell, “Anonymization of nominal data based on semantic
marginality,” Inf. Sci., vol. 242, pp. 35–48, 2013.
[8] J. Domingo-Ferrer and J. Soria-Comas, “From t-closeness to differential privacy and vice versa in data
anonymization,” Knowl.- Based Syst., vol. 74, pp. 151–158, 2015.
[9] J. Domingo-Ferrer and V. Torra, “Ordinal, continuous and heterogeneous k-anonymity through microaggregation,”
Data Mining Knowl. Discovery, vol. 11, no. 2, pp. 195–212, 2005.
[10] C. Dwork, “Differential privacy,” in Proc. 33rd Int. Colloquium Automata, Languages Programm.,2006,pp.1–12
12 June 2018 Indira College of Engineering Management, Pune 39