SlideShare a Scribd company logo
Jitendra Kuldeep
Information Technology
Privacy preserving data publishing
Microdata
• Purposes:
– Allow researchers to effectively study the correlation
between various attributes
– Protect the privacy of every patient
Name Age Sex Zipcode Disease
Bob 23 M 11000 pneumonia
Ken 27 M 13000 dyspepsia
Peter 35 M 59000 dyspepsia
Sam 59 M 12000 pneumonia
Jane 61 F 54000 flu
Linda 65 F 25000 gastritis
Alice 65 F 25000 flu
Mandy 70 F 30000 bronchitis
A naïve solution
• It does not work. See next.
publish
Name Age Sex Zipcode Disease
Bob 23 M 11000 pneumonia
Ken 27 M 13000 dyspepsia
Peter 35 M 59000 dyspepsia
Sam 59 M 12000 pneumonia
Jane 61 F 54000 flu
Linda 65 F 25000 gastritis
Alice 65 F 25000 flu
Mandy 70 F 30000 bronchitis
Age Sex Zipcode Disease
23 M 11000 pneumonia
27 M 13000 dyspepsia
35 M 59000 dyspepsia
59 M 12000 pneumonia
61 F 54000 flu
65 F 25000 gastritis
65 F 25000 flu
70 F 30000 bronchitis
Generalization
A generalized table
Age Sex Zipcode Disease
[21, 60] M [10001, 60000] pneumonia
[21, 60] M [10001, 60000] dyspepsia
[21, 60] M [10001, 60000] dyspepsia
[21, 60] M [10001, 60000] pneumonia
[61, 70] F [10001, 60000] flu
[61, 70] F [10001, 60000] gastritis
[61, 70] F [10001, 60000] flu
[61, 70] F [10001, 60000] bronchitis
Name Age Sex Zipcode
Bob 23 M 11000
• Transform each QI value into a less specific form
How much generalization do we need?
l-diversity
• A QI-group with m tuples is l-diverse, iff each sensitive
value appears no more than m / l times in the QI-group.
• A table is l-diverse, iff all of its QI-groups are l-diverse.
• The above table is 2-diverse.
2 QI-groups
Quasi-identifier (QI) attributes Sensitive attribute
Age Sex Zipcode Disease
[21, 60] M [10001, 60000] pneumonia
[21, 60] M [10001, 60000] dyspepsia
[21, 60] M [10001, 60000] dyspepsia
[21, 60] M [10001, 60000] pneumonia
[61, 70] F [10001, 60000] flu
[61, 70] F [10001, 60000] gastritis
[61, 70] F [10001, 60000] flu
[61, 70] F [10001, 60000] bronchitis
What l-diversity guarantees
• From an l-diverse generalized table, an adversary
(without any prior knowledge) can infer the sensitive value
of each individual with confidence at most 1/l
Age Sex Zipcode Disease
[21, 60] M [10001, 60000] pneumonia
[21, 60] M [10001, 60000] dyspepsia
[21, 60] M [10001, 60000] dyspepsia
[21, 60] M [10001, 60000] pneumonia
[61, 70] F [10001, 60000] flu
[61, 70] F [10001, 60000] gastritis
[61, 70] F [10001, 60000] flu
[61, 70] F [10001, 60000] bronchitis
Name Age Sex Zipcode
Bob 23 M 11000
A 2-diverse generalized table
Defect of generalization
• Query A: SELECT COUNT(*) from Unknown-Microdata
WHERE Disease = ‘pneumonia’ AND Age in [0, 30]
AND Zipcode in [10001, 20000]
Age Sex Zipcode Disease
[21, 60] M [10001, 60000] pneumonia
[21, 60] M [10001, 60000] dyspepsia
[21, 60] M [10001, 60000] dyspepsia
[21, 60] M [10001, 60000] pneumonia
[61, 70] F [10001, 60000] flu
[61, 70] F [10001, 60000] gastritis
[61, 70] F [10001, 60000] flu
[61, 70] F [10001, 60000] bronchitis
• Estimated answer: 2 * p, where p is the probability that each of the
two tuples satisfies the query conditions
Defect of generalization (cont.)
• Query A: SELECT COUNT(*) from Unknown-Microdata
WHERE Disease = ‘pneumonia’ AND Age in [0, 30]
AND Zipcode in [10001, 20000]
• p = Area( R1 ∩ Q) / Area( R1 ) = 0.05
• Estimated answer for query A: 2 * p = 0.1
Age Sex Zipcode Disease
[21, 60] M [10001, 60000] pneumonia
[21, 60] M [10001, 60000] pneumonia
Defect of generalization (cont.)
• Query A: SELECT COUNT(*) from Unknown-Microdata
WHERE Disease = ‘pneumonia’ AND Age in [0, 30]
AND Zipcode in [10001, 20000]
• Estimated answer from the generalized table: 0.1
Name Age Sex Zipcode Disease
Bob 23 M 11000 pneumonia
Ken 27 M 13000 dyspepsia
Peter 35 M 59000 dyspepsia
Sam 59 M 12000 pneumonia
Jane 61 F 54000 flu
Linda 65 F 25000 gastritis
Alice 65 F 25000 flu
Mandy 70 F 30000 bronchitis
• The exact answer should be: 1
Contributions
1. We propose an alternative technique for
generalization called Anatomy, which
allows much more accurate data
analysis while still preserving privacy.
2. We develop an algorithm for computing
anatomized tables that
• runs in linear I/Os
• (nearly) minimizes information loss
Outline
• Basic Idea of Anatomy
• Preserving Correlation
• Algorithm for Anatomy
• Experimental Results
Basic Idea of Anatomy
• For a given microdata table, Anatomy releases a quasi-
identifier table (QIT) and a sensitive table (ST)
Group-ID Disease Count
1 dyspepsia 2
1 pneumonia 2
2 bronchitis 1
2 flu 2
2 gastritis 1
Age Sex Zipcode Group-ID
23 M 11000 1
27 M 13000 1
35 M 59000 1
59 M 12000 1
61 F 54000 2
65 F 25000 2
65 F 25000 2
70 F 30000 2
Quasi-identifier Table (QIT)
Sensitive Table (ST)
Age Sex Zipcode Disease
23 M 11000 pneumonia
27 M 13000 dyspepsia
35 M 59000 dyspepsia
59 M 12000 pneumonia
61 F 54000 flu
65 F 25000 gastritis
65 F 25000 flu
70 F 30000 bronchitis
microdata
Basic Idea of Anatomy (cont.)
1. Select a partition of the tuples
Age Sex Zipcode Disease
23 M 11000 pneumonia
27 M 13000 dyspepsia
35 M 59000 dyspepsia
59 M 12000 pneumonia
61 F 54000 flu
65 F 25000 gastritis
65 F 25000 flu
70 F 30000 bronchitis
QI group 1
QI group 2
a 2-diverse partition
Basic Idea of Anatomy (cont.)
2. Generate a quasi-idnetifier table (QIT) and a sensitive
table (ST) based on the selected partition
Disease
pneumonia
dyspepsia
dyspepsia
pneumonia
flu
gastritis
flu
bronchitis
Age Sex Zipcode
23 M 11000
27 M 13000
35 M 59000
59 M 12000
61 F 54000
65 F 25000
65 F 25000
70 F 30000
group 1
group 2
quasi-identifier table (QIT) sensitive table (ST)
Basic Idea of Anatomy (cont.)
2. Generate a quasi-idnetifier table (QIT) and a sensitive
table (ST) based on the selected partition
Group-ID Disease
1 pneumonia
1 dyspepsia
1 dyspepsia
1 pneumonia
2 flu
2 gastritis
2 flu
2 bronchitis
Age Sex Zipcode Group-ID
23 M 11000 1
27 M 13000 1
35 M 59000 1
59 M 12000 1
61 F 54000 2
65 F 25000 2
65 F 25000 2
70 F 30000 2
quasi-identifier table (QIT) sensitive table (ST)
Basic Idea of Anatomy (cont.)
2. Generate a quasi-idnetifier table (QIT) and a sensitive
table (ST) based on the selected partition
Group-ID Disease Count
1 dyspepsia 2
1 pneumonia 2
2 bronchitis 1
2 flu 2
2 gastritis 1
Age Sex Zipcode Group-ID
23 M 11000 1
27 M 13000 1
35 M 59000 1
59 M 12000 1
61 F 54000 2
65 F 25000 2
65 F 25000 2
70 F 30000 2
quasi-identifier table (QIT)
sensitive table (ST)
Privacy Preservation
• From a pair of QIT and ST generated from an l-diverse
partition, the adversary can infer the sensitive value of
each individual with confidence at most 1/l
Group-ID Disease Count
1 dyspepsia 2
1 pneumonia 2
2 bronchitis 1
2 flu 2
2 gastritis 1
Age Sex Zipcode Group-ID
23 M 11000 1
27 M 13000 1
35 M 59000 1
59 M 12000 1
61 F 54000 2
65 F 25000 2
65 F 25000 2
70 F 30000 2
quasi-identifier table (QIT)
sensitive table (ST)
Name Age Sex Zipcode
Bob 23 M 11000
Accuracy of Data Analysis
• Query A: SELECT COUNT(*) from Unknown-Microdata
WHERE Disease = ‘pneumonia’ AND Age in [0, 30]
AND Zipcode in [10001, 20000]
Group-ID Disease Count
1 dyspepsia 2
1 pneumonia 2
2 bronchitis 1
2 flu 2
2 gastritis 1
Age Sex Zipcode Group-ID
23 M 11000 1
27 M 13000 1
35 M 59000 1
59 M 12000 1
61 F 54000 2
65 F 25000 2
65 F 25000 2
70 F 30000 2
quasi-identifier table (QIT)
sensitive table (ST)
Accuracy of Data Analysis (cont.)
• Query A: SELECT COUNT(*) from Unknown-Microdata
WHERE Disease = ‘pneumonia’ AND Age in [0, 30]
AND Zipcode in [10001, 20000]
• 2 patients have contracted pneumonia
• 2 out of 4 patients satisfies the query condition on Age and
Zipcode
• Estimated answer for query A: 2 * 2 / 4 = 1, which is also the
actual result from the original microdata
Age Sex Zipcode Group-ID
23 M 11000 1
27 M 13000 1
35 M 59000 1
59 M 12000 1
t1
t2
t3
t4
Preserving Correlation
• Let us first examine the correlation between Age and
Disease in our running example
• Each tuple in the microdata can be mapped to a point in
the (Age, Disease) domain
• The above tuple can be mapped to (23, pneumonia).
Age Sex Zipcode Disease
23 M 11000 pneumonia
.... … … …
t1
Preserving Correlation (cont.)
• We model this tuple using a probability density function
(pdf):
Preserving Correlation (cont.)
Anatomize
• An algorithm for computing anatomized
tables that
– runs in I/O cost linear to the cardinality n of
the microdata table
– minimizes the RCE when n is a multiple of l,
otherwise achieves an RCE that is higher
than the lower-bound by a factor of at most
1 + 1/n
Accuracy of Data Analysis
Summary
• Anatomy outperforms generalization by allowing
much more accurate data analysis on the
published data.
• Anatomized tables (with nearly optimal quality
guarantee) can be computed in I/O cost linear to
the database cardinality.
Anatomy: Simple and Effective Privacy Preservation

More Related Content

Similar to Anatomy: Simple and Effective Privacy Preservation

Diagnostic Journeys of Patients Evaluated for Lyme Disease and Given Extended...
Diagnostic Journeys of Patients Evaluated for Lyme Disease and Given Extended...Diagnostic Journeys of Patients Evaluated for Lyme Disease and Given Extended...
Diagnostic Journeys of Patients Evaluated for Lyme Disease and Given Extended...
YiJu Tseng
 
file 1 siloamlv
file 1 siloamlvfile 1 siloamlv
file 1 siloamlv
Dicky A Wartono
 
Copy of osce cme24.04.2012 calicut
Copy of osce cme24.04.2012 calicut Copy of osce cme24.04.2012 calicut
Copy of osce cme24.04.2012 calicut
Sanjeev Medehal
 
Drs. Escobar, Pikus, and Blackwell’s CMC X-Ray Mastery Project: April Cases
Drs. Escobar, Pikus, and Blackwell’s CMC X-Ray Mastery Project: April CasesDrs. Escobar, Pikus, and Blackwell’s CMC X-Ray Mastery Project: April Cases
Drs. Escobar, Pikus, and Blackwell’s CMC X-Ray Mastery Project: April Cases
Sean M. Fox
 
Diabetes care in the time of Covid 19 2021 Prof Vinod Patel
Diabetes care in the time of Covid 19 2021 Prof Vinod PatelDiabetes care in the time of Covid 19 2021 Prof Vinod Patel
Diabetes care in the time of Covid 19 2021 Prof Vinod Patel
Vinod0901
 
Sepsis seminar final
Sepsis seminar   finalSepsis seminar   final
Sepsis seminar final
pulmonary medicine
 
International classification of disease
International classification of diseaseInternational classification of disease
International classification of disease
GAMANDEEP
 
Epidemiology Lectures for UG
Epidemiology Lectures for UGEpidemiology Lectures for UG
Epidemiology Lectures for UG
amitakashyap1
 
Epidemiology
EpidemiologyEpidemiology
Epidemiology
sobana M
 
Role of Stem Cell Transplantation in the Treatment of Ulcerative Colitis
Role of Stem Cell Transplantation in the Treatment of Ulcerative ColitisRole of Stem Cell Transplantation in the Treatment of Ulcerative Colitis
Role of Stem Cell Transplantation in the Treatment of Ulcerative Colitis
Mohammed Fathy Zaky
 
Métodos de investigación en epidemiología ambiental
Métodos de investigación en epidemiología ambientalMétodos de investigación en epidemiología ambiental
Métodos de investigación en epidemiología ambiental
Salud en todas
 
The mystery of lyme disease
The mystery of lyme diseaseThe mystery of lyme disease
The mystery of lyme disease
YiJu Tseng
 
LapJag Pneumoni aspirasi.pptx
LapJag Pneumoni aspirasi.pptxLapJag Pneumoni aspirasi.pptx
LapJag Pneumoni aspirasi.pptx
MichaelJosia2
 
uptodate on acute kidney injury
uptodate on acute kidney injuryuptodate on acute kidney injury
uptodate on acute kidney injury
Sherif Mohammed
 
"Biomarkers in sepsis and septic shock" by Prof. Jérôme Pugin
"Biomarkers in sepsis and septic shock" by Prof. Jérôme Pugin"Biomarkers in sepsis and septic shock" by Prof. Jérôme Pugin
"Biomarkers in sepsis and septic shock" by Prof. Jérôme Pugin
Vall d'Hebron Institute of Research (VHIR)
 
Liver Abscess ppt.pptx
Liver Abscess ppt.pptxLiver Abscess ppt.pptx
Liver Abscess ppt.pptx
DrKalpitThakor
 
Drs. Lorenzen and Barlock’s CMC X-Ray Mastery Project: March Cases
Drs. Lorenzen and Barlock’s CMC X-Ray Mastery Project: March CasesDrs. Lorenzen and Barlock’s CMC X-Ray Mastery Project: March Cases
Drs. Lorenzen and Barlock’s CMC X-Ray Mastery Project: March Cases
Sean M. Fox
 
Ibd ppt
Ibd ppt Ibd ppt
Ibd ppt
shimaadawa
 
Principles and Methods of Epidemiologic Study
Principles and Methods of Epidemiologic StudyPrinciples and Methods of Epidemiologic Study
Principles and Methods of Epidemiologic Study
DugoGadisa
 
Drs. Milam and Thomas's CMC X-Ray Mastery Project: July cases
Drs. Milam and Thomas's CMC X-Ray Mastery Project: July casesDrs. Milam and Thomas's CMC X-Ray Mastery Project: July cases
Drs. Milam and Thomas's CMC X-Ray Mastery Project: July cases
Sean M. Fox
 

Similar to Anatomy: Simple and Effective Privacy Preservation (20)

Diagnostic Journeys of Patients Evaluated for Lyme Disease and Given Extended...
Diagnostic Journeys of Patients Evaluated for Lyme Disease and Given Extended...Diagnostic Journeys of Patients Evaluated for Lyme Disease and Given Extended...
Diagnostic Journeys of Patients Evaluated for Lyme Disease and Given Extended...
 
file 1 siloamlv
file 1 siloamlvfile 1 siloamlv
file 1 siloamlv
 
Copy of osce cme24.04.2012 calicut
Copy of osce cme24.04.2012 calicut Copy of osce cme24.04.2012 calicut
Copy of osce cme24.04.2012 calicut
 
Drs. Escobar, Pikus, and Blackwell’s CMC X-Ray Mastery Project: April Cases
Drs. Escobar, Pikus, and Blackwell’s CMC X-Ray Mastery Project: April CasesDrs. Escobar, Pikus, and Blackwell’s CMC X-Ray Mastery Project: April Cases
Drs. Escobar, Pikus, and Blackwell’s CMC X-Ray Mastery Project: April Cases
 
Diabetes care in the time of Covid 19 2021 Prof Vinod Patel
Diabetes care in the time of Covid 19 2021 Prof Vinod PatelDiabetes care in the time of Covid 19 2021 Prof Vinod Patel
Diabetes care in the time of Covid 19 2021 Prof Vinod Patel
 
Sepsis seminar final
Sepsis seminar   finalSepsis seminar   final
Sepsis seminar final
 
International classification of disease
International classification of diseaseInternational classification of disease
International classification of disease
 
Epidemiology Lectures for UG
Epidemiology Lectures for UGEpidemiology Lectures for UG
Epidemiology Lectures for UG
 
Epidemiology
EpidemiologyEpidemiology
Epidemiology
 
Role of Stem Cell Transplantation in the Treatment of Ulcerative Colitis
Role of Stem Cell Transplantation in the Treatment of Ulcerative ColitisRole of Stem Cell Transplantation in the Treatment of Ulcerative Colitis
Role of Stem Cell Transplantation in the Treatment of Ulcerative Colitis
 
Métodos de investigación en epidemiología ambiental
Métodos de investigación en epidemiología ambientalMétodos de investigación en epidemiología ambiental
Métodos de investigación en epidemiología ambiental
 
The mystery of lyme disease
The mystery of lyme diseaseThe mystery of lyme disease
The mystery of lyme disease
 
LapJag Pneumoni aspirasi.pptx
LapJag Pneumoni aspirasi.pptxLapJag Pneumoni aspirasi.pptx
LapJag Pneumoni aspirasi.pptx
 
uptodate on acute kidney injury
uptodate on acute kidney injuryuptodate on acute kidney injury
uptodate on acute kidney injury
 
"Biomarkers in sepsis and septic shock" by Prof. Jérôme Pugin
"Biomarkers in sepsis and septic shock" by Prof. Jérôme Pugin"Biomarkers in sepsis and septic shock" by Prof. Jérôme Pugin
"Biomarkers in sepsis and septic shock" by Prof. Jérôme Pugin
 
Liver Abscess ppt.pptx
Liver Abscess ppt.pptxLiver Abscess ppt.pptx
Liver Abscess ppt.pptx
 
Drs. Lorenzen and Barlock’s CMC X-Ray Mastery Project: March Cases
Drs. Lorenzen and Barlock’s CMC X-Ray Mastery Project: March CasesDrs. Lorenzen and Barlock’s CMC X-Ray Mastery Project: March Cases
Drs. Lorenzen and Barlock’s CMC X-Ray Mastery Project: March Cases
 
Ibd ppt
Ibd ppt Ibd ppt
Ibd ppt
 
Principles and Methods of Epidemiologic Study
Principles and Methods of Epidemiologic StudyPrinciples and Methods of Epidemiologic Study
Principles and Methods of Epidemiologic Study
 
Drs. Milam and Thomas's CMC X-Ray Mastery Project: July cases
Drs. Milam and Thomas's CMC X-Ray Mastery Project: July casesDrs. Milam and Thomas's CMC X-Ray Mastery Project: July cases
Drs. Milam and Thomas's CMC X-Ray Mastery Project: July cases
 

Recently uploaded

Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...
PsychoTech Services
 
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptxBeyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
EduSkills OECD
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
PECB
 
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
eBook.com.bd (প্রয়োজনীয় বাংলা বই)
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
Jean Carlos Nunes Paixão
 
IGCSE Biology Chapter 14- Reproduction in Plants.pdf
IGCSE Biology Chapter 14- Reproduction in Plants.pdfIGCSE Biology Chapter 14- Reproduction in Plants.pdf
IGCSE Biology Chapter 14- Reproduction in Plants.pdf
Amin Marwan
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
heathfieldcps1
 
Solutons Maths Escape Room Spatial .pptx
Solutons Maths Escape Room Spatial .pptxSolutons Maths Escape Room Spatial .pptx
Solutons Maths Escape Room Spatial .pptx
spdendr
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
Nguyen Thanh Tu Collection
 
Bed Making ( Introduction, Purpose, Types, Articles, Scientific principles, N...
Bed Making ( Introduction, Purpose, Types, Articles, Scientific principles, N...Bed Making ( Introduction, Purpose, Types, Articles, Scientific principles, N...
Bed Making ( Introduction, Purpose, Types, Articles, Scientific principles, N...
Leena Ghag-Sakpal
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Fajar Baskoro
 
Chapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptxChapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptx
Denish Jangid
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
Priyankaranawat4
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
mulvey2
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
haiqairshad
 
The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
History of Stoke Newington
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
RAHUL
 
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
adhitya5119
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
WaniBasim
 
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
สมใจ จันสุกสี
 

Recently uploaded (20)

Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...
 
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptxBeyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
 
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
 
IGCSE Biology Chapter 14- Reproduction in Plants.pdf
IGCSE Biology Chapter 14- Reproduction in Plants.pdfIGCSE Biology Chapter 14- Reproduction in Plants.pdf
IGCSE Biology Chapter 14- Reproduction in Plants.pdf
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
 
Solutons Maths Escape Room Spatial .pptx
Solutons Maths Escape Room Spatial .pptxSolutons Maths Escape Room Spatial .pptx
Solutons Maths Escape Room Spatial .pptx
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
 
Bed Making ( Introduction, Purpose, Types, Articles, Scientific principles, N...
Bed Making ( Introduction, Purpose, Types, Articles, Scientific principles, N...Bed Making ( Introduction, Purpose, Types, Articles, Scientific principles, N...
Bed Making ( Introduction, Purpose, Types, Articles, Scientific principles, N...
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
 
Chapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptxChapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptx
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
 
The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
 
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
 
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
 

Anatomy: Simple and Effective Privacy Preservation

  • 2. Privacy preserving data publishing Microdata • Purposes: – Allow researchers to effectively study the correlation between various attributes – Protect the privacy of every patient Name Age Sex Zipcode Disease Bob 23 M 11000 pneumonia Ken 27 M 13000 dyspepsia Peter 35 M 59000 dyspepsia Sam 59 M 12000 pneumonia Jane 61 F 54000 flu Linda 65 F 25000 gastritis Alice 65 F 25000 flu Mandy 70 F 30000 bronchitis
  • 3. A naïve solution • It does not work. See next. publish Name Age Sex Zipcode Disease Bob 23 M 11000 pneumonia Ken 27 M 13000 dyspepsia Peter 35 M 59000 dyspepsia Sam 59 M 12000 pneumonia Jane 61 F 54000 flu Linda 65 F 25000 gastritis Alice 65 F 25000 flu Mandy 70 F 30000 bronchitis Age Sex Zipcode Disease 23 M 11000 pneumonia 27 M 13000 dyspepsia 35 M 59000 dyspepsia 59 M 12000 pneumonia 61 F 54000 flu 65 F 25000 gastritis 65 F 25000 flu 70 F 30000 bronchitis
  • 4. Generalization A generalized table Age Sex Zipcode Disease [21, 60] M [10001, 60000] pneumonia [21, 60] M [10001, 60000] dyspepsia [21, 60] M [10001, 60000] dyspepsia [21, 60] M [10001, 60000] pneumonia [61, 70] F [10001, 60000] flu [61, 70] F [10001, 60000] gastritis [61, 70] F [10001, 60000] flu [61, 70] F [10001, 60000] bronchitis Name Age Sex Zipcode Bob 23 M 11000 • Transform each QI value into a less specific form How much generalization do we need?
  • 5. l-diversity • A QI-group with m tuples is l-diverse, iff each sensitive value appears no more than m / l times in the QI-group. • A table is l-diverse, iff all of its QI-groups are l-diverse. • The above table is 2-diverse. 2 QI-groups Quasi-identifier (QI) attributes Sensitive attribute Age Sex Zipcode Disease [21, 60] M [10001, 60000] pneumonia [21, 60] M [10001, 60000] dyspepsia [21, 60] M [10001, 60000] dyspepsia [21, 60] M [10001, 60000] pneumonia [61, 70] F [10001, 60000] flu [61, 70] F [10001, 60000] gastritis [61, 70] F [10001, 60000] flu [61, 70] F [10001, 60000] bronchitis
  • 6. What l-diversity guarantees • From an l-diverse generalized table, an adversary (without any prior knowledge) can infer the sensitive value of each individual with confidence at most 1/l Age Sex Zipcode Disease [21, 60] M [10001, 60000] pneumonia [21, 60] M [10001, 60000] dyspepsia [21, 60] M [10001, 60000] dyspepsia [21, 60] M [10001, 60000] pneumonia [61, 70] F [10001, 60000] flu [61, 70] F [10001, 60000] gastritis [61, 70] F [10001, 60000] flu [61, 70] F [10001, 60000] bronchitis Name Age Sex Zipcode Bob 23 M 11000 A 2-diverse generalized table
  • 7. Defect of generalization • Query A: SELECT COUNT(*) from Unknown-Microdata WHERE Disease = ‘pneumonia’ AND Age in [0, 30] AND Zipcode in [10001, 20000] Age Sex Zipcode Disease [21, 60] M [10001, 60000] pneumonia [21, 60] M [10001, 60000] dyspepsia [21, 60] M [10001, 60000] dyspepsia [21, 60] M [10001, 60000] pneumonia [61, 70] F [10001, 60000] flu [61, 70] F [10001, 60000] gastritis [61, 70] F [10001, 60000] flu [61, 70] F [10001, 60000] bronchitis • Estimated answer: 2 * p, where p is the probability that each of the two tuples satisfies the query conditions
  • 8. Defect of generalization (cont.) • Query A: SELECT COUNT(*) from Unknown-Microdata WHERE Disease = ‘pneumonia’ AND Age in [0, 30] AND Zipcode in [10001, 20000] • p = Area( R1 ∩ Q) / Area( R1 ) = 0.05 • Estimated answer for query A: 2 * p = 0.1 Age Sex Zipcode Disease [21, 60] M [10001, 60000] pneumonia [21, 60] M [10001, 60000] pneumonia
  • 9. Defect of generalization (cont.) • Query A: SELECT COUNT(*) from Unknown-Microdata WHERE Disease = ‘pneumonia’ AND Age in [0, 30] AND Zipcode in [10001, 20000] • Estimated answer from the generalized table: 0.1 Name Age Sex Zipcode Disease Bob 23 M 11000 pneumonia Ken 27 M 13000 dyspepsia Peter 35 M 59000 dyspepsia Sam 59 M 12000 pneumonia Jane 61 F 54000 flu Linda 65 F 25000 gastritis Alice 65 F 25000 flu Mandy 70 F 30000 bronchitis • The exact answer should be: 1
  • 10. Contributions 1. We propose an alternative technique for generalization called Anatomy, which allows much more accurate data analysis while still preserving privacy. 2. We develop an algorithm for computing anatomized tables that • runs in linear I/Os • (nearly) minimizes information loss
  • 11. Outline • Basic Idea of Anatomy • Preserving Correlation • Algorithm for Anatomy • Experimental Results
  • 12. Basic Idea of Anatomy • For a given microdata table, Anatomy releases a quasi- identifier table (QIT) and a sensitive table (ST) Group-ID Disease Count 1 dyspepsia 2 1 pneumonia 2 2 bronchitis 1 2 flu 2 2 gastritis 1 Age Sex Zipcode Group-ID 23 M 11000 1 27 M 13000 1 35 M 59000 1 59 M 12000 1 61 F 54000 2 65 F 25000 2 65 F 25000 2 70 F 30000 2 Quasi-identifier Table (QIT) Sensitive Table (ST) Age Sex Zipcode Disease 23 M 11000 pneumonia 27 M 13000 dyspepsia 35 M 59000 dyspepsia 59 M 12000 pneumonia 61 F 54000 flu 65 F 25000 gastritis 65 F 25000 flu 70 F 30000 bronchitis microdata
  • 13. Basic Idea of Anatomy (cont.) 1. Select a partition of the tuples Age Sex Zipcode Disease 23 M 11000 pneumonia 27 M 13000 dyspepsia 35 M 59000 dyspepsia 59 M 12000 pneumonia 61 F 54000 flu 65 F 25000 gastritis 65 F 25000 flu 70 F 30000 bronchitis QI group 1 QI group 2 a 2-diverse partition
  • 14. Basic Idea of Anatomy (cont.) 2. Generate a quasi-idnetifier table (QIT) and a sensitive table (ST) based on the selected partition Disease pneumonia dyspepsia dyspepsia pneumonia flu gastritis flu bronchitis Age Sex Zipcode 23 M 11000 27 M 13000 35 M 59000 59 M 12000 61 F 54000 65 F 25000 65 F 25000 70 F 30000 group 1 group 2 quasi-identifier table (QIT) sensitive table (ST)
  • 15. Basic Idea of Anatomy (cont.) 2. Generate a quasi-idnetifier table (QIT) and a sensitive table (ST) based on the selected partition Group-ID Disease 1 pneumonia 1 dyspepsia 1 dyspepsia 1 pneumonia 2 flu 2 gastritis 2 flu 2 bronchitis Age Sex Zipcode Group-ID 23 M 11000 1 27 M 13000 1 35 M 59000 1 59 M 12000 1 61 F 54000 2 65 F 25000 2 65 F 25000 2 70 F 30000 2 quasi-identifier table (QIT) sensitive table (ST)
  • 16. Basic Idea of Anatomy (cont.) 2. Generate a quasi-idnetifier table (QIT) and a sensitive table (ST) based on the selected partition Group-ID Disease Count 1 dyspepsia 2 1 pneumonia 2 2 bronchitis 1 2 flu 2 2 gastritis 1 Age Sex Zipcode Group-ID 23 M 11000 1 27 M 13000 1 35 M 59000 1 59 M 12000 1 61 F 54000 2 65 F 25000 2 65 F 25000 2 70 F 30000 2 quasi-identifier table (QIT) sensitive table (ST)
  • 17. Privacy Preservation • From a pair of QIT and ST generated from an l-diverse partition, the adversary can infer the sensitive value of each individual with confidence at most 1/l Group-ID Disease Count 1 dyspepsia 2 1 pneumonia 2 2 bronchitis 1 2 flu 2 2 gastritis 1 Age Sex Zipcode Group-ID 23 M 11000 1 27 M 13000 1 35 M 59000 1 59 M 12000 1 61 F 54000 2 65 F 25000 2 65 F 25000 2 70 F 30000 2 quasi-identifier table (QIT) sensitive table (ST) Name Age Sex Zipcode Bob 23 M 11000
  • 18. Accuracy of Data Analysis • Query A: SELECT COUNT(*) from Unknown-Microdata WHERE Disease = ‘pneumonia’ AND Age in [0, 30] AND Zipcode in [10001, 20000] Group-ID Disease Count 1 dyspepsia 2 1 pneumonia 2 2 bronchitis 1 2 flu 2 2 gastritis 1 Age Sex Zipcode Group-ID 23 M 11000 1 27 M 13000 1 35 M 59000 1 59 M 12000 1 61 F 54000 2 65 F 25000 2 65 F 25000 2 70 F 30000 2 quasi-identifier table (QIT) sensitive table (ST)
  • 19. Accuracy of Data Analysis (cont.) • Query A: SELECT COUNT(*) from Unknown-Microdata WHERE Disease = ‘pneumonia’ AND Age in [0, 30] AND Zipcode in [10001, 20000] • 2 patients have contracted pneumonia • 2 out of 4 patients satisfies the query condition on Age and Zipcode • Estimated answer for query A: 2 * 2 / 4 = 1, which is also the actual result from the original microdata Age Sex Zipcode Group-ID 23 M 11000 1 27 M 13000 1 35 M 59000 1 59 M 12000 1 t1 t2 t3 t4
  • 20. Preserving Correlation • Let us first examine the correlation between Age and Disease in our running example • Each tuple in the microdata can be mapped to a point in the (Age, Disease) domain • The above tuple can be mapped to (23, pneumonia). Age Sex Zipcode Disease 23 M 11000 pneumonia .... … … … t1
  • 21. Preserving Correlation (cont.) • We model this tuple using a probability density function (pdf):
  • 23. Anatomize • An algorithm for computing anatomized tables that – runs in I/O cost linear to the cardinality n of the microdata table – minimizes the RCE when n is a multiple of l, otherwise achieves an RCE that is higher than the lower-bound by a factor of at most 1 + 1/n
  • 24. Accuracy of Data Analysis
  • 25. Summary • Anatomy outperforms generalization by allowing much more accurate data analysis on the published data. • Anatomized tables (with nearly optimal quality guarantee) can be computed in I/O cost linear to the database cardinality.