SlideShare a Scribd company logo
Distinct l-diversity
Anonymization of Set Valued
Data
Submitted by,
Khude Rohan Ravindra
Abhishek Puligudla
Abhilash Namdev
Guidance by,
B. K. Tripathy
Contents
• Basic
• Abstract
• Introduction
• Literature survey
• Proposed algorithm
• Conclusion and Future work
• References
Set Valued data
There are two ways of representing data in the table
• Singular valued data
• Set Valued data
“Cancer”, “Blood Pressure”
“Blood Pressure”, “Heart disease”
“Hemorrhoids”, “Blood Pressure”
“Heart disease”, “Blood Pressure”,
“Diabetes”, “Hemorrhoids”, “Cancer”
Blood Pressure
Heart disease
Hemorrhoids
Hemorrhoids
Anonymization
• Data anonymization is type of information sanitization whose intent is
privacy protection or privacy preservation
• It is the process of either encrypting or removing personally
identifiable information from data sets
• So that the people whom the data describe remain anonymous.
Privacy Preservation
• Privacy here means the logical security of data, NOT the traditional
security of data e.g. access control, theft, hacking etc.
• Here, adversary uses legitimate methods
• Various databases are published e.g. Census data, Hospital records
• Allows researchers to effectively study the correlation between various
attributes
Need for Privacy
• Suppose a hospital has some person-specific patient data which it
wants to publish
• It wants to publish such that:
• Information remains practically useful
• Identity of an individual cannot be determined
• Adversary might infer the secret/sensitive data from the published
database
Need for Privacy
Non-Sensitive Data Sensitive Data
# Zip Age Nationality Condition
1 13053 28 Indian Heart Disease
2 13067 29 American Heart Disease
3 13053 35 Canadian Viral Infection
4 13067 36 Japanese Cancer
# Name Zip Age Nationality
1 John 13053 28 American
2 Bob 13067 29 American
3 Chris 13053 23 American
Published
Data
Voter List
Data leak!
Classification of Attributes
Key attributes
Name, address, phone number - uniquely identifying!
Always removed before release
Quasi-identifier –
Attribute values which can uniquely identify an individual
{ zip-code, nationality, age }
Sensitive-identifier -
information corresponding to Individuals.
{medical condition, salary, location}
Abstract
• The privacy preserving of the set-valued data is important to avoid the
tampering
• Anonymising implies that adversary not able to identify who’s individual
data it is
• The use of k-anonymity fails in situations like Homogeneity and
Background Knowledge attack
• L-diversity overcome the drawbacks of k-anonymity
• In this paper we are proposing the use of l-diversity which uses the
sensitivity for generalizing the data when anonymized
• And also a algorithm for hiding the sensitive attribute information which
reveals identity of one’s individual in situations like homogeneity and
background knowledge
Types of anonymization on set valued data
• K anonymity
• Top-Down, Local Generalization
• Recoding
• L-diversity(we are using distinct l-diversity)
K - anonymity
• The information for each person contained in the released table
cannot be distinguished from at least k-1 individuals whose
information also appears in the release
• anonymity - the condition of being anonymous
• Change data in such a way that for each tuple in the resulting table
there are atleast (k-1) other tuples with the same value for the quasi-
identifier
Techniques for k-anonymization
• Generalization
-Replace the original value by a semantically consistent but less
specific value
• Suppression
-Data not released at all
-Can be Cell-Level or (more commonly) Tuple-Level
Techniques for anonymization
# Zip Age Nationality Condition
1 130** < 40 * Heart Disease
2 130** < 40 * Heart Disease
3 130** < 40 * Viral Infection
4 130** < 40 * Cancer
Generalization Suppression (cell-level)
Generalization Hierarchies
ZIP Age Nationality
1305813053
1305
130

1306713063
1306
2928
< 30
< 40
*
3536
3*
USCanadian
American
JapaneseIndian
Asian
*
• Generalization Hierarchies: Data owner defines how values
can be generalized
• Table Generalization: A table generalization is created by
generalizing all values in a column to a
specific level of generalization
K-Anonymity Drawbacks
• K-anonymity alone does not provide full privacy!
• There are two types of attacks that affect K-Anonymity. They
are
• Homogeneity Attacks and
• Background Knowledge Attacks
Homogeneity attacks
Original Table 4-anonymous tables
Since Alice and Bob’s are both neighbors, Alice knows that Bob age is a 31-year-old male from America who’s 13053
is a zip code where he lives. Hence, Alice knows that record number of Bob’s is 9,10,11, or 12. She can also
understand from the data that Bob has disease cancer.
Umeko
Matches
here
Bob
Matches
here
Bob has Cancer!
Background Knowledge Attacks
Original Table 4-anonymous tables
Alice knows that Umeko is a 21 year-old female living in zip code 13068 from Japanese. Depending on this
information, Alice identified that record number 1,2,3, or 4 Umeko’s information is contained. With suppl-ementary
information can predict that Umeko being Japanese and Alice knows that Japanese have an extremely low
occurrences of heart diseases, Alice can concluded with proximate certainty that Umeko has a viral infection.
Umeko
Matches
here
Bob
Matches
here
Bob has Cancer!
Umeko has Viral Infection!
Attacks in Set-Valued data
Distinct L-diversity
• An equivalence class is said to have l-diversity if there are at least “l
well represented” values for the sensitive attribute.
• A table is said to have l-diversity if every equivalence class of the table
has l-diversity.
• To obtain “l well represented” values, Each equivalence class has at
least l distinct values for the sensitive field. This is called Distinct L-
diversity.
Applying Algorithm on Sensitive data
CONCLUSION AND FUTURE WORK
• Our algorithm is efficient enough to hide the sensitive attribute
information.
• Which might be used to reveal the identity of one’s individual in
situations like homogeneity and background knowledge.
• So we have generalized the sensitive attribute after obtaining diverse
clustered data.
• The anonymization technique which we have proposed will just serve
to make privacy breaches more difficult.
• Still it is not clear how to de-anoymize.
• Also our algorithm can be further extended to anonymize datasets
which will have more than one sensitive attributes.
References
[1] H. Yeye, “Anonymization of SetValued Data via TopDown, Local Generalization
Anonymizing Set-Valued Social Data,” ACM, August 24-28,2009.
[2] B. K. Tripathy, A. Mitra, “An Algorithm to achieve k-anonymity and l-diversity
anonymization in Social Networks,” vol. 65, IEEE 2012.
[3] S. Wang, Y. Tsai, H. Kao, T. Hong, “Anonymizing Set-Valued Social Data,” vol. 2 Issue 1,
2010 IEEE/ACM.
[4]T. Manolis, M. Nikos, P. Kalnis, “Privacy preserving Anonymization of Set valued Data,”
Volume 1 Issue 1, ACM, August 2008.
[5] D. K. Arora, D. Bansal and S. Sofat, “Comparative Analysis of Anonymization
Techniques,” Int. J. of Electronics and Electrical Eng., vol. (7), pp. 773-778, 2014
[6] S. Vinogradov, A. Pastsyak, “Evaluation of Data Anonymization Tools,” IARIA, 2012.

More Related Content

What's hot

Deep learning algorithms for intrusion detection systems in internet of thin...
Deep learning algorithms for intrusion detection systems in  internet of thin...Deep learning algorithms for intrusion detection systems in  internet of thin...
Deep learning algorithms for intrusion detection systems in internet of thin...
IJECEIAES
 
Asymmetric Cryptography.pptx
Asymmetric Cryptography.pptxAsymmetric Cryptography.pptx
Asymmetric Cryptography.pptx
diaa46
 
Exploratory data analysis with Python
Exploratory data analysis with PythonExploratory data analysis with Python
Exploratory data analysis with Python
Davis David
 
Data mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataData mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, data
Salah Amean
 
CRYPTOGRAPHY & NETWORK SECURITY - unit 1
CRYPTOGRAPHY & NETWORK SECURITY -  unit 1CRYPTOGRAPHY & NETWORK SECURITY -  unit 1
CRYPTOGRAPHY & NETWORK SECURITY - unit 1
RAMESHBABU311293
 
CrowdCasts Monthly: You Have an Adversary Problem
CrowdCasts Monthly: You Have an Adversary ProblemCrowdCasts Monthly: You Have an Adversary Problem
CrowdCasts Monthly: You Have an Adversary Problem
CrowdStrike
 
S/MIME & E-mail Security (Network Security)
S/MIME & E-mail Security (Network Security)S/MIME & E-mail Security (Network Security)
S/MIME & E-mail Security (Network Security)
Prafull Johri
 
Email Forensics
Email ForensicsEmail Forensics
Email Forensics
Gol D Roger
 
Hacking Access Control Systems
Hacking Access Control SystemsHacking Access Control Systems
Hacking Access Control Systems
Dennis Maldonado
 
CRYPTOGRAPHY & NETWORK SECURITY
CRYPTOGRAPHY & NETWORK SECURITYCRYPTOGRAPHY & NETWORK SECURITY
Homomorphic encryption on Blockchain Principles
Homomorphic encryption on Blockchain PrinciplesHomomorphic encryption on Blockchain Principles
Homomorphic encryption on Blockchain Principles
Johann Höchtl
 
Data Encryption Standard
Data Encryption StandardData Encryption Standard
Data Encryption Standard
Adri Jovin
 
Social engineering
Social engineeringSocial engineering
Social engineering
Vishal Kumar
 
IP Security
IP SecurityIP Security
IP Security
Ambo University
 
RSA Algorithm
RSA AlgorithmRSA Algorithm
RSA Algorithm
Srinadh Muvva
 
Digital forensic
Digital forensicDigital forensic
Digital forensic
Chandan Sah
 
Information Security
Information SecurityInformation Security
Information Security
Dhilsath Fathima
 
Cryptography and Network Security
Cryptography and Network SecurityCryptography and Network Security
Cryptography and Network Security
Ramki M
 
Social Engineering
Social EngineeringSocial Engineering
Social Engineering
SpencerBurton8
 

What's hot (20)

Deep learning algorithms for intrusion detection systems in internet of thin...
Deep learning algorithms for intrusion detection systems in  internet of thin...Deep learning algorithms for intrusion detection systems in  internet of thin...
Deep learning algorithms for intrusion detection systems in internet of thin...
 
Asymmetric Cryptography.pptx
Asymmetric Cryptography.pptxAsymmetric Cryptography.pptx
Asymmetric Cryptography.pptx
 
Exploratory data analysis with Python
Exploratory data analysis with PythonExploratory data analysis with Python
Exploratory data analysis with Python
 
Data mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataData mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, data
 
CRYPTOGRAPHY & NETWORK SECURITY - unit 1
CRYPTOGRAPHY & NETWORK SECURITY -  unit 1CRYPTOGRAPHY & NETWORK SECURITY -  unit 1
CRYPTOGRAPHY & NETWORK SECURITY - unit 1
 
CrowdCasts Monthly: You Have an Adversary Problem
CrowdCasts Monthly: You Have an Adversary ProblemCrowdCasts Monthly: You Have an Adversary Problem
CrowdCasts Monthly: You Have an Adversary Problem
 
Presentation1
Presentation1Presentation1
Presentation1
 
S/MIME & E-mail Security (Network Security)
S/MIME & E-mail Security (Network Security)S/MIME & E-mail Security (Network Security)
S/MIME & E-mail Security (Network Security)
 
Email Forensics
Email ForensicsEmail Forensics
Email Forensics
 
Hacking Access Control Systems
Hacking Access Control SystemsHacking Access Control Systems
Hacking Access Control Systems
 
CRYPTOGRAPHY & NETWORK SECURITY
CRYPTOGRAPHY & NETWORK SECURITYCRYPTOGRAPHY & NETWORK SECURITY
CRYPTOGRAPHY & NETWORK SECURITY
 
Homomorphic encryption on Blockchain Principles
Homomorphic encryption on Blockchain PrinciplesHomomorphic encryption on Blockchain Principles
Homomorphic encryption on Blockchain Principles
 
Data Encryption Standard
Data Encryption StandardData Encryption Standard
Data Encryption Standard
 
Social engineering
Social engineeringSocial engineering
Social engineering
 
IP Security
IP SecurityIP Security
IP Security
 
RSA Algorithm
RSA AlgorithmRSA Algorithm
RSA Algorithm
 
Digital forensic
Digital forensicDigital forensic
Digital forensic
 
Information Security
Information SecurityInformation Security
Information Security
 
Cryptography and Network Security
Cryptography and Network SecurityCryptography and Network Security
Cryptography and Network Security
 
Social Engineering
Social EngineeringSocial Engineering
Social Engineering
 

Viewers also liked

Camouflage: Automated Anonymization of Field Data (ICSE 2011)
Camouflage: Automated Anonymization of Field Data (ICSE 2011)Camouflage: Automated Anonymization of Field Data (ICSE 2011)
Camouflage: Automated Anonymization of Field Data (ICSE 2011)James Clause
 
Microdata anonymization considerations
Microdata anonymization considerationsMicrodata anonymization considerations
Microdata anonymization considerations
Rajiv Ranjan
 
What does six sigma really mean??
What does six sigma really mean??What does six sigma really mean??
MOST RECENT POWER POINT PRESENTATION
MOST RECENT POWER POINT PRESENTATIONMOST RECENT POWER POINT PRESENTATION
MOST RECENT POWER POINT PRESENTATIONBinta Moustapha
 
ISO 13053 Lead Auditor - Four Page Brochure
ISO 13053 Lead Auditor - Four Page Brochure	ISO 13053 Lead Auditor - Four Page Brochure
ISO 13053 Lead Auditor - Four Page Brochure
PECB
 
ISO 9001 LEAD AUDITOR ( QMS ).PDF
ISO 9001 LEAD AUDITOR ( QMS ).PDFISO 9001 LEAD AUDITOR ( QMS ).PDF
ISO 9001 LEAD AUDITOR ( QMS ).PDFmohammad riyaz
 
Training and development
Training and developmentTraining and development
Training and developmentnastrankhalid
 
PECB Webinar: ISO 21500 - A Guidance to Project Managers on ISO 21500 Project...
PECB Webinar: ISO 21500 - A Guidance to Project Managers on ISO 21500 Project...PECB Webinar: ISO 21500 - A Guidance to Project Managers on ISO 21500 Project...
PECB Webinar: ISO 21500 - A Guidance to Project Managers on ISO 21500 Project...
PECB
 
Challenges in Software Ecosystem Research
Challenges in Software Ecosystem ResearchChallenges in Software Ecosystem Research
Challenges in Software Ecosystem Research
Tom Mens
 
PECB webinar: ISO 50001:2011 - Understanding Energy Management System (EnMS)
PECB webinar: ISO 50001:2011 - Understanding Energy Management System (EnMS) PECB webinar: ISO 50001:2011 - Understanding Energy Management System (EnMS)
PECB webinar: ISO 50001:2011 - Understanding Energy Management System (EnMS)
PECB
 
Data Privacy: Anonymization & Re-Identification
Data Privacy: Anonymization & Re-IdentificationData Privacy: Anonymization & Re-Identification
Data Privacy: Anonymization & Re-Identification
Mike Nowakowski
 
Managing the need for Laboratory Competence in the Food Supply Chain
Managing the need for Laboratory Competence in the Food Supply ChainManaging the need for Laboratory Competence in the Food Supply Chain
Managing the need for Laboratory Competence in the Food Supply Chain
PECB
 
BASICS FOR ISO 9001 QMS LEAD AUDITOR COURSE
BASICS  FOR ISO 9001 QMS LEAD AUDITOR COURSEBASICS  FOR ISO 9001 QMS LEAD AUDITOR COURSE
BASICS FOR ISO 9001 QMS LEAD AUDITOR COURSE
Nithin V. Joseph
 
Iso 9001 lead auditor course training irca approved
Iso 9001 lead auditor course training   irca approvedIso 9001 lead auditor course training   irca approved
Iso 9001 lead auditor course training irca approved
Intertek Moody
 
Implementation of Enterprise Risk Management with ISO 31000 Risk Management S...
Implementation of Enterprise Risk Management with ISO 31000 Risk Management S...Implementation of Enterprise Risk Management with ISO 31000 Risk Management S...
Implementation of Enterprise Risk Management with ISO 31000 Risk Management S...
PECB
 
ISO 9001:2015 Reshaping the role of the auditor - updated version
ISO 9001:2015 Reshaping the role of the auditor - updated versionISO 9001:2015 Reshaping the role of the auditor - updated version
ISO 9001:2015 Reshaping the role of the auditor - updated version
Bywater Training
 
Special Mountaintop Mining Powerpoint Finished
Special Mountaintop Mining Powerpoint FinishedSpecial Mountaintop Mining Powerpoint Finished
Special Mountaintop Mining Powerpoint Finished
tlheadley
 
Internal Audit 03-03-16
Internal Audit 03-03-16Internal Audit 03-03-16
Internal Audit 03-03-16Lisa Barnes
 
ISO 9001: 2008 QMS Awareness PPT
ISO 9001: 2008 QMS Awareness PPTISO 9001: 2008 QMS Awareness PPT
ISO 9001: 2008 QMS Awareness PPT
knp_slidess
 
ISO 9001:2015 Overview: Revisions & Impact - Part 1
ISO 9001:2015 Overview: Revisions & Impact - Part 1ISO 9001:2015 Overview: Revisions & Impact - Part 1
ISO 9001:2015 Overview: Revisions & Impact - Part 1
DQS Inc.
 

Viewers also liked (20)

Camouflage: Automated Anonymization of Field Data (ICSE 2011)
Camouflage: Automated Anonymization of Field Data (ICSE 2011)Camouflage: Automated Anonymization of Field Data (ICSE 2011)
Camouflage: Automated Anonymization of Field Data (ICSE 2011)
 
Microdata anonymization considerations
Microdata anonymization considerationsMicrodata anonymization considerations
Microdata anonymization considerations
 
What does six sigma really mean??
What does six sigma really mean??What does six sigma really mean??
What does six sigma really mean??
 
MOST RECENT POWER POINT PRESENTATION
MOST RECENT POWER POINT PRESENTATIONMOST RECENT POWER POINT PRESENTATION
MOST RECENT POWER POINT PRESENTATION
 
ISO 13053 Lead Auditor - Four Page Brochure
ISO 13053 Lead Auditor - Four Page Brochure	ISO 13053 Lead Auditor - Four Page Brochure
ISO 13053 Lead Auditor - Four Page Brochure
 
ISO 9001 LEAD AUDITOR ( QMS ).PDF
ISO 9001 LEAD AUDITOR ( QMS ).PDFISO 9001 LEAD AUDITOR ( QMS ).PDF
ISO 9001 LEAD AUDITOR ( QMS ).PDF
 
Training and development
Training and developmentTraining and development
Training and development
 
PECB Webinar: ISO 21500 - A Guidance to Project Managers on ISO 21500 Project...
PECB Webinar: ISO 21500 - A Guidance to Project Managers on ISO 21500 Project...PECB Webinar: ISO 21500 - A Guidance to Project Managers on ISO 21500 Project...
PECB Webinar: ISO 21500 - A Guidance to Project Managers on ISO 21500 Project...
 
Challenges in Software Ecosystem Research
Challenges in Software Ecosystem ResearchChallenges in Software Ecosystem Research
Challenges in Software Ecosystem Research
 
PECB webinar: ISO 50001:2011 - Understanding Energy Management System (EnMS)
PECB webinar: ISO 50001:2011 - Understanding Energy Management System (EnMS) PECB webinar: ISO 50001:2011 - Understanding Energy Management System (EnMS)
PECB webinar: ISO 50001:2011 - Understanding Energy Management System (EnMS)
 
Data Privacy: Anonymization & Re-Identification
Data Privacy: Anonymization & Re-IdentificationData Privacy: Anonymization & Re-Identification
Data Privacy: Anonymization & Re-Identification
 
Managing the need for Laboratory Competence in the Food Supply Chain
Managing the need for Laboratory Competence in the Food Supply ChainManaging the need for Laboratory Competence in the Food Supply Chain
Managing the need for Laboratory Competence in the Food Supply Chain
 
BASICS FOR ISO 9001 QMS LEAD AUDITOR COURSE
BASICS  FOR ISO 9001 QMS LEAD AUDITOR COURSEBASICS  FOR ISO 9001 QMS LEAD AUDITOR COURSE
BASICS FOR ISO 9001 QMS LEAD AUDITOR COURSE
 
Iso 9001 lead auditor course training irca approved
Iso 9001 lead auditor course training   irca approvedIso 9001 lead auditor course training   irca approved
Iso 9001 lead auditor course training irca approved
 
Implementation of Enterprise Risk Management with ISO 31000 Risk Management S...
Implementation of Enterprise Risk Management with ISO 31000 Risk Management S...Implementation of Enterprise Risk Management with ISO 31000 Risk Management S...
Implementation of Enterprise Risk Management with ISO 31000 Risk Management S...
 
ISO 9001:2015 Reshaping the role of the auditor - updated version
ISO 9001:2015 Reshaping the role of the auditor - updated versionISO 9001:2015 Reshaping the role of the auditor - updated version
ISO 9001:2015 Reshaping the role of the auditor - updated version
 
Special Mountaintop Mining Powerpoint Finished
Special Mountaintop Mining Powerpoint FinishedSpecial Mountaintop Mining Powerpoint Finished
Special Mountaintop Mining Powerpoint Finished
 
Internal Audit 03-03-16
Internal Audit 03-03-16Internal Audit 03-03-16
Internal Audit 03-03-16
 
ISO 9001: 2008 QMS Awareness PPT
ISO 9001: 2008 QMS Awareness PPTISO 9001: 2008 QMS Awareness PPT
ISO 9001: 2008 QMS Awareness PPT
 
ISO 9001:2015 Overview: Revisions & Impact - Part 1
ISO 9001:2015 Overview: Revisions & Impact - Part 1ISO 9001:2015 Overview: Revisions & Impact - Part 1
ISO 9001:2015 Overview: Revisions & Impact - Part 1
 

Similar to Distinct l diversity anonymization of set valued data

Data explosion
Data explosionData explosion
Data explosion
G Prachi
 
张振杰:大数据时代的隐私保护的挑战和机遇
张振杰:大数据时代的隐私保护的挑战和机遇张振杰:大数据时代的隐私保护的挑战和机遇
张振杰:大数据时代的隐私保护的挑战和机遇
hdhappy001
 
DATA & PRIVACY PROTECTION Anna Monreale Università di Pisa
DATA & PRIVACY PROTECTION Anna Monreale Università di PisaDATA & PRIVACY PROTECTION Anna Monreale Università di Pisa
DATA & PRIVACY PROTECTION Anna Monreale Università di Pisa
Laboratorio di Cultura Digitale, labcd.humnet.unipi.it
 
Cp34550555
Cp34550555Cp34550555
Cp34550555
IJERA Editor
 
A SURVEY ON DATA ANONYMIZATION FOR BIG DATA SECURITY
A SURVEY ON DATA ANONYMIZATION FOR BIG DATA SECURITYA SURVEY ON DATA ANONYMIZATION FOR BIG DATA SECURITY
A SURVEY ON DATA ANONYMIZATION FOR BIG DATA SECURITY
Journal For Research
 
Statistical discolosure control
Statistical discolosure controlStatistical discolosure control
Statistical discolosure control
University of Southampton
 
Altman - Perfectly Anonymous Data is Perfectly Useless Data
Altman - Perfectly Anonymous Data is Perfectly Useless DataAltman - Perfectly Anonymous Data is Perfectly Useless Data
Altman - Perfectly Anonymous Data is Perfectly Useless Data
National Information Standards Organization (NISO)
 
Data_Preparation_Modeling_Evaluation.ppt
Data_Preparation_Modeling_Evaluation.pptData_Preparation_Modeling_Evaluation.ppt
Data_Preparation_Modeling_Evaluation.ppt
AronMozart1
 
Information security in big data -privacy and data mining
Information security in big data -privacy and data miningInformation security in big data -privacy and data mining
Information security in big data -privacy and data mining
harithavijay94
 
Differential privacy and applications to location privacy
Differential privacy and applications to location privacyDifferential privacy and applications to location privacy
Differential privacy and applications to location privacy
Pôle Systematic Paris-Region
 
Personal identifiable information vs attribute data
Personal identifiable information vs attribute data Personal identifiable information vs attribute data
Personal identifiable information vs attribute data
EleanorCollard
 
G0953643
G0953643G0953643
G0953643
IOSR Journals
 
Amnesia: Data anonymization made easy (8th OpenAIRE workshop)
Amnesia: Data anonymization made easy (8th OpenAIRE workshop)Amnesia: Data anonymization made easy (8th OpenAIRE workshop)
Amnesia: Data anonymization made easy (8th OpenAIRE workshop)
OpenAIRE
 
Protection models
Protection modelsProtection models
Protection models
G Prachi
 
Intro to statistics
Intro to statisticsIntro to statistics
Intro to statisticsUlster BOCES
 
cyber security.pptx
cyber security.pptxcyber security.pptx
cyber security.pptx
karthikaparthasarath
 
Privacy Preserving by Anonymization Approach
Privacy Preserving by Anonymization ApproachPrivacy Preserving by Anonymization Approach
Privacy Preserving by Anonymization Approach
rahulmonikasharma
 
Managing Confidential Information – Trends and Approaches
Managing Confidential Information – Trends and ApproachesManaging Confidential Information – Trends and Approaches
Managing Confidential Information – Trends and Approaches
Micah Altman
 
Quality Assessment of Mortality Information
Quality Assessment of Mortality InformationQuality Assessment of Mortality Information

Similar to Distinct l diversity anonymization of set valued data (20)

Data explosion
Data explosionData explosion
Data explosion
 
张振杰:大数据时代的隐私保护的挑战和机遇
张振杰:大数据时代的隐私保护的挑战和机遇张振杰:大数据时代的隐私保护的挑战和机遇
张振杰:大数据时代的隐私保护的挑战和机遇
 
DATA & PRIVACY PROTECTION Anna Monreale Università di Pisa
DATA & PRIVACY PROTECTION Anna Monreale Università di PisaDATA & PRIVACY PROTECTION Anna Monreale Università di Pisa
DATA & PRIVACY PROTECTION Anna Monreale Università di Pisa
 
Cp34550555
Cp34550555Cp34550555
Cp34550555
 
A SURVEY ON DATA ANONYMIZATION FOR BIG DATA SECURITY
A SURVEY ON DATA ANONYMIZATION FOR BIG DATA SECURITYA SURVEY ON DATA ANONYMIZATION FOR BIG DATA SECURITY
A SURVEY ON DATA ANONYMIZATION FOR BIG DATA SECURITY
 
Statistical discolosure control
Statistical discolosure controlStatistical discolosure control
Statistical discolosure control
 
Altman - Perfectly Anonymous Data is Perfectly Useless Data
Altman - Perfectly Anonymous Data is Perfectly Useless DataAltman - Perfectly Anonymous Data is Perfectly Useless Data
Altman - Perfectly Anonymous Data is Perfectly Useless Data
 
Data_Preparation_Modeling_Evaluation.ppt
Data_Preparation_Modeling_Evaluation.pptData_Preparation_Modeling_Evaluation.ppt
Data_Preparation_Modeling_Evaluation.ppt
 
Information security in big data -privacy and data mining
Information security in big data -privacy and data miningInformation security in big data -privacy and data mining
Information security in big data -privacy and data mining
 
Differential privacy and applications to location privacy
Differential privacy and applications to location privacyDifferential privacy and applications to location privacy
Differential privacy and applications to location privacy
 
Personal identifiable information vs attribute data
Personal identifiable information vs attribute data Personal identifiable information vs attribute data
Personal identifiable information vs attribute data
 
PII.pptx
PII.pptxPII.pptx
PII.pptx
 
G0953643
G0953643G0953643
G0953643
 
Amnesia: Data anonymization made easy (8th OpenAIRE workshop)
Amnesia: Data anonymization made easy (8th OpenAIRE workshop)Amnesia: Data anonymization made easy (8th OpenAIRE workshop)
Amnesia: Data anonymization made easy (8th OpenAIRE workshop)
 
Protection models
Protection modelsProtection models
Protection models
 
Intro to statistics
Intro to statisticsIntro to statistics
Intro to statistics
 
cyber security.pptx
cyber security.pptxcyber security.pptx
cyber security.pptx
 
Privacy Preserving by Anonymization Approach
Privacy Preserving by Anonymization ApproachPrivacy Preserving by Anonymization Approach
Privacy Preserving by Anonymization Approach
 
Managing Confidential Information – Trends and Approaches
Managing Confidential Information – Trends and ApproachesManaging Confidential Information – Trends and Approaches
Managing Confidential Information – Trends and Approaches
 
Quality Assessment of Mortality Information
Quality Assessment of Mortality InformationQuality Assessment of Mortality Information
Quality Assessment of Mortality Information
 

Recently uploaded

一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 

Recently uploaded (20)

一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 

Distinct l diversity anonymization of set valued data

  • 1. Distinct l-diversity Anonymization of Set Valued Data Submitted by, Khude Rohan Ravindra Abhishek Puligudla Abhilash Namdev Guidance by, B. K. Tripathy
  • 2. Contents • Basic • Abstract • Introduction • Literature survey • Proposed algorithm • Conclusion and Future work • References
  • 3. Set Valued data There are two ways of representing data in the table • Singular valued data • Set Valued data “Cancer”, “Blood Pressure” “Blood Pressure”, “Heart disease” “Hemorrhoids”, “Blood Pressure” “Heart disease”, “Blood Pressure”, “Diabetes”, “Hemorrhoids”, “Cancer” Blood Pressure Heart disease Hemorrhoids Hemorrhoids
  • 4. Anonymization • Data anonymization is type of information sanitization whose intent is privacy protection or privacy preservation • It is the process of either encrypting or removing personally identifiable information from data sets • So that the people whom the data describe remain anonymous.
  • 5. Privacy Preservation • Privacy here means the logical security of data, NOT the traditional security of data e.g. access control, theft, hacking etc. • Here, adversary uses legitimate methods • Various databases are published e.g. Census data, Hospital records • Allows researchers to effectively study the correlation between various attributes
  • 6. Need for Privacy • Suppose a hospital has some person-specific patient data which it wants to publish • It wants to publish such that: • Information remains practically useful • Identity of an individual cannot be determined • Adversary might infer the secret/sensitive data from the published database
  • 7. Need for Privacy Non-Sensitive Data Sensitive Data # Zip Age Nationality Condition 1 13053 28 Indian Heart Disease 2 13067 29 American Heart Disease 3 13053 35 Canadian Viral Infection 4 13067 36 Japanese Cancer # Name Zip Age Nationality 1 John 13053 28 American 2 Bob 13067 29 American 3 Chris 13053 23 American Published Data Voter List Data leak!
  • 8. Classification of Attributes Key attributes Name, address, phone number - uniquely identifying! Always removed before release Quasi-identifier – Attribute values which can uniquely identify an individual { zip-code, nationality, age } Sensitive-identifier - information corresponding to Individuals. {medical condition, salary, location}
  • 9. Abstract • The privacy preserving of the set-valued data is important to avoid the tampering • Anonymising implies that adversary not able to identify who’s individual data it is • The use of k-anonymity fails in situations like Homogeneity and Background Knowledge attack • L-diversity overcome the drawbacks of k-anonymity • In this paper we are proposing the use of l-diversity which uses the sensitivity for generalizing the data when anonymized • And also a algorithm for hiding the sensitive attribute information which reveals identity of one’s individual in situations like homogeneity and background knowledge
  • 10. Types of anonymization on set valued data • K anonymity • Top-Down, Local Generalization • Recoding • L-diversity(we are using distinct l-diversity)
  • 11. K - anonymity • The information for each person contained in the released table cannot be distinguished from at least k-1 individuals whose information also appears in the release • anonymity - the condition of being anonymous • Change data in such a way that for each tuple in the resulting table there are atleast (k-1) other tuples with the same value for the quasi- identifier
  • 12. Techniques for k-anonymization • Generalization -Replace the original value by a semantically consistent but less specific value • Suppression -Data not released at all -Can be Cell-Level or (more commonly) Tuple-Level
  • 13. Techniques for anonymization # Zip Age Nationality Condition 1 130** < 40 * Heart Disease 2 130** < 40 * Heart Disease 3 130** < 40 * Viral Infection 4 130** < 40 * Cancer Generalization Suppression (cell-level)
  • 14. Generalization Hierarchies ZIP Age Nationality 1305813053 1305 130  1306713063 1306 2928 < 30 < 40 * 3536 3* USCanadian American JapaneseIndian Asian * • Generalization Hierarchies: Data owner defines how values can be generalized • Table Generalization: A table generalization is created by generalizing all values in a column to a specific level of generalization
  • 15. K-Anonymity Drawbacks • K-anonymity alone does not provide full privacy! • There are two types of attacks that affect K-Anonymity. They are • Homogeneity Attacks and • Background Knowledge Attacks
  • 16. Homogeneity attacks Original Table 4-anonymous tables Since Alice and Bob’s are both neighbors, Alice knows that Bob age is a 31-year-old male from America who’s 13053 is a zip code where he lives. Hence, Alice knows that record number of Bob’s is 9,10,11, or 12. She can also understand from the data that Bob has disease cancer. Umeko Matches here Bob Matches here Bob has Cancer!
  • 17. Background Knowledge Attacks Original Table 4-anonymous tables Alice knows that Umeko is a 21 year-old female living in zip code 13068 from Japanese. Depending on this information, Alice identified that record number 1,2,3, or 4 Umeko’s information is contained. With suppl-ementary information can predict that Umeko being Japanese and Alice knows that Japanese have an extremely low occurrences of heart diseases, Alice can concluded with proximate certainty that Umeko has a viral infection. Umeko Matches here Bob Matches here Bob has Cancer! Umeko has Viral Infection!
  • 19. Distinct L-diversity • An equivalence class is said to have l-diversity if there are at least “l well represented” values for the sensitive attribute. • A table is said to have l-diversity if every equivalence class of the table has l-diversity. • To obtain “l well represented” values, Each equivalence class has at least l distinct values for the sensitive field. This is called Distinct L- diversity.
  • 20. Applying Algorithm on Sensitive data
  • 21. CONCLUSION AND FUTURE WORK • Our algorithm is efficient enough to hide the sensitive attribute information. • Which might be used to reveal the identity of one’s individual in situations like homogeneity and background knowledge. • So we have generalized the sensitive attribute after obtaining diverse clustered data. • The anonymization technique which we have proposed will just serve to make privacy breaches more difficult. • Still it is not clear how to de-anoymize. • Also our algorithm can be further extended to anonymize datasets which will have more than one sensitive attributes.
  • 22. References [1] H. Yeye, “Anonymization of SetValued Data via TopDown, Local Generalization Anonymizing Set-Valued Social Data,” ACM, August 24-28,2009. [2] B. K. Tripathy, A. Mitra, “An Algorithm to achieve k-anonymity and l-diversity anonymization in Social Networks,” vol. 65, IEEE 2012. [3] S. Wang, Y. Tsai, H. Kao, T. Hong, “Anonymizing Set-Valued Social Data,” vol. 2 Issue 1, 2010 IEEE/ACM. [4]T. Manolis, M. Nikos, P. Kalnis, “Privacy preserving Anonymization of Set valued Data,” Volume 1 Issue 1, ACM, August 2008. [5] D. K. Arora, D. Bansal and S. Sofat, “Comparative Analysis of Anonymization Techniques,” Int. J. of Electronics and Electrical Eng., vol. (7), pp. 773-778, 2014 [6] S. Vinogradov, A. Pastsyak, “Evaluation of Data Anonymization Tools,” IARIA, 2012.

Editor's Notes

  1. Basic – what is...... Set valued data, Anonymization, L-diversity, Distinct l-diversity.
  2. anonymous is used to describe situations where the acting person's name is unknown. Sanitization is the process of removing sensitive information from a document or other message