SlideShare a Scribd company logo
1 of 26
Jitendra Kuldeep
Information Technology
Privacy preserving data publishing
Microdata
• Purposes:
– Allow researchers to effectively study the correlation
between various attributes
– Protect the privacy of every patient
Name Age Sex Zipcode Disease
Bob 23 M 11000 pneumonia
Ken 27 M 13000 dyspepsia
Peter 35 M 59000 dyspepsia
Sam 59 M 12000 pneumonia
Jane 61 F 54000 flu
Linda 65 F 25000 gastritis
Alice 65 F 25000 flu
Mandy 70 F 30000 bronchitis
A naïve solution
• It does not work. See next.
publish
Name Age Sex Zipcode Disease
Bob 23 M 11000 pneumonia
Ken 27 M 13000 dyspepsia
Peter 35 M 59000 dyspepsia
Sam 59 M 12000 pneumonia
Jane 61 F 54000 flu
Linda 65 F 25000 gastritis
Alice 65 F 25000 flu
Mandy 70 F 30000 bronchitis
Age Sex Zipcode Disease
23 M 11000 pneumonia
27 M 13000 dyspepsia
35 M 59000 dyspepsia
59 M 12000 pneumonia
61 F 54000 flu
65 F 25000 gastritis
65 F 25000 flu
70 F 30000 bronchitis
Generalization
A generalized table
Age Sex Zipcode Disease
[21, 60] M [10001, 60000] pneumonia
[21, 60] M [10001, 60000] dyspepsia
[21, 60] M [10001, 60000] dyspepsia
[21, 60] M [10001, 60000] pneumonia
[61, 70] F [10001, 60000] flu
[61, 70] F [10001, 60000] gastritis
[61, 70] F [10001, 60000] flu
[61, 70] F [10001, 60000] bronchitis
Name Age Sex Zipcode
Bob 23 M 11000
• Transform each QI value into a less specific form
How much generalization do we need?
l-diversity
• A QI-group with m tuples is l-diverse, iff each sensitive
value appears no more than m / l times in the QI-group.
• A table is l-diverse, iff all of its QI-groups are l-diverse.
• The above table is 2-diverse.
2 QI-groups
Quasi-identifier (QI) attributes Sensitive attribute
Age Sex Zipcode Disease
[21, 60] M [10001, 60000] pneumonia
[21, 60] M [10001, 60000] dyspepsia
[21, 60] M [10001, 60000] dyspepsia
[21, 60] M [10001, 60000] pneumonia
[61, 70] F [10001, 60000] flu
[61, 70] F [10001, 60000] gastritis
[61, 70] F [10001, 60000] flu
[61, 70] F [10001, 60000] bronchitis
What l-diversity guarantees
• From an l-diverse generalized table, an adversary
(without any prior knowledge) can infer the sensitive value
of each individual with confidence at most 1/l
Age Sex Zipcode Disease
[21, 60] M [10001, 60000] pneumonia
[21, 60] M [10001, 60000] dyspepsia
[21, 60] M [10001, 60000] dyspepsia
[21, 60] M [10001, 60000] pneumonia
[61, 70] F [10001, 60000] flu
[61, 70] F [10001, 60000] gastritis
[61, 70] F [10001, 60000] flu
[61, 70] F [10001, 60000] bronchitis
Name Age Sex Zipcode
Bob 23 M 11000
A 2-diverse generalized table
Defect of generalization
• Query A: SELECT COUNT(*) from Unknown-Microdata
WHERE Disease = ‘pneumonia’ AND Age in [0, 30]
AND Zipcode in [10001, 20000]
Age Sex Zipcode Disease
[21, 60] M [10001, 60000] pneumonia
[21, 60] M [10001, 60000] dyspepsia
[21, 60] M [10001, 60000] dyspepsia
[21, 60] M [10001, 60000] pneumonia
[61, 70] F [10001, 60000] flu
[61, 70] F [10001, 60000] gastritis
[61, 70] F [10001, 60000] flu
[61, 70] F [10001, 60000] bronchitis
• Estimated answer: 2 * p, where p is the probability that each of the
two tuples satisfies the query conditions
Defect of generalization (cont.)
• Query A: SELECT COUNT(*) from Unknown-Microdata
WHERE Disease = ‘pneumonia’ AND Age in [0, 30]
AND Zipcode in [10001, 20000]
• p = Area( R1 ∩ Q) / Area( R1 ) = 0.05
• Estimated answer for query A: 2 * p = 0.1
Age Sex Zipcode Disease
[21, 60] M [10001, 60000] pneumonia
[21, 60] M [10001, 60000] pneumonia
Defect of generalization (cont.)
• Query A: SELECT COUNT(*) from Unknown-Microdata
WHERE Disease = ‘pneumonia’ AND Age in [0, 30]
AND Zipcode in [10001, 20000]
• Estimated answer from the generalized table: 0.1
Name Age Sex Zipcode Disease
Bob 23 M 11000 pneumonia
Ken 27 M 13000 dyspepsia
Peter 35 M 59000 dyspepsia
Sam 59 M 12000 pneumonia
Jane 61 F 54000 flu
Linda 65 F 25000 gastritis
Alice 65 F 25000 flu
Mandy 70 F 30000 bronchitis
• The exact answer should be: 1
Contributions
1. We propose an alternative technique for
generalization called Anatomy, which
allows much more accurate data
analysis while still preserving privacy.
2. We develop an algorithm for computing
anatomized tables that
• runs in linear I/Os
• (nearly) minimizes information loss
Outline
• Basic Idea of Anatomy
• Preserving Correlation
• Algorithm for Anatomy
• Experimental Results
Basic Idea of Anatomy
• For a given microdata table, Anatomy releases a quasi-
identifier table (QIT) and a sensitive table (ST)
Group-ID Disease Count
1 dyspepsia 2
1 pneumonia 2
2 bronchitis 1
2 flu 2
2 gastritis 1
Age Sex Zipcode Group-ID
23 M 11000 1
27 M 13000 1
35 M 59000 1
59 M 12000 1
61 F 54000 2
65 F 25000 2
65 F 25000 2
70 F 30000 2
Quasi-identifier Table (QIT)
Sensitive Table (ST)
Age Sex Zipcode Disease
23 M 11000 pneumonia
27 M 13000 dyspepsia
35 M 59000 dyspepsia
59 M 12000 pneumonia
61 F 54000 flu
65 F 25000 gastritis
65 F 25000 flu
70 F 30000 bronchitis
microdata
Basic Idea of Anatomy (cont.)
1. Select a partition of the tuples
Age Sex Zipcode Disease
23 M 11000 pneumonia
27 M 13000 dyspepsia
35 M 59000 dyspepsia
59 M 12000 pneumonia
61 F 54000 flu
65 F 25000 gastritis
65 F 25000 flu
70 F 30000 bronchitis
QI group 1
QI group 2
a 2-diverse partition
Basic Idea of Anatomy (cont.)
2. Generate a quasi-idnetifier table (QIT) and a sensitive
table (ST) based on the selected partition
Disease
pneumonia
dyspepsia
dyspepsia
pneumonia
flu
gastritis
flu
bronchitis
Age Sex Zipcode
23 M 11000
27 M 13000
35 M 59000
59 M 12000
61 F 54000
65 F 25000
65 F 25000
70 F 30000
group 1
group 2
quasi-identifier table (QIT) sensitive table (ST)
Basic Idea of Anatomy (cont.)
2. Generate a quasi-idnetifier table (QIT) and a sensitive
table (ST) based on the selected partition
Group-ID Disease
1 pneumonia
1 dyspepsia
1 dyspepsia
1 pneumonia
2 flu
2 gastritis
2 flu
2 bronchitis
Age Sex Zipcode Group-ID
23 M 11000 1
27 M 13000 1
35 M 59000 1
59 M 12000 1
61 F 54000 2
65 F 25000 2
65 F 25000 2
70 F 30000 2
quasi-identifier table (QIT) sensitive table (ST)
Basic Idea of Anatomy (cont.)
2. Generate a quasi-idnetifier table (QIT) and a sensitive
table (ST) based on the selected partition
Group-ID Disease Count
1 dyspepsia 2
1 pneumonia 2
2 bronchitis 1
2 flu 2
2 gastritis 1
Age Sex Zipcode Group-ID
23 M 11000 1
27 M 13000 1
35 M 59000 1
59 M 12000 1
61 F 54000 2
65 F 25000 2
65 F 25000 2
70 F 30000 2
quasi-identifier table (QIT)
sensitive table (ST)
Privacy Preservation
• From a pair of QIT and ST generated from an l-diverse
partition, the adversary can infer the sensitive value of
each individual with confidence at most 1/l
Group-ID Disease Count
1 dyspepsia 2
1 pneumonia 2
2 bronchitis 1
2 flu 2
2 gastritis 1
Age Sex Zipcode Group-ID
23 M 11000 1
27 M 13000 1
35 M 59000 1
59 M 12000 1
61 F 54000 2
65 F 25000 2
65 F 25000 2
70 F 30000 2
quasi-identifier table (QIT)
sensitive table (ST)
Name Age Sex Zipcode
Bob 23 M 11000
Accuracy of Data Analysis
• Query A: SELECT COUNT(*) from Unknown-Microdata
WHERE Disease = ‘pneumonia’ AND Age in [0, 30]
AND Zipcode in [10001, 20000]
Group-ID Disease Count
1 dyspepsia 2
1 pneumonia 2
2 bronchitis 1
2 flu 2
2 gastritis 1
Age Sex Zipcode Group-ID
23 M 11000 1
27 M 13000 1
35 M 59000 1
59 M 12000 1
61 F 54000 2
65 F 25000 2
65 F 25000 2
70 F 30000 2
quasi-identifier table (QIT)
sensitive table (ST)
Accuracy of Data Analysis (cont.)
• Query A: SELECT COUNT(*) from Unknown-Microdata
WHERE Disease = ‘pneumonia’ AND Age in [0, 30]
AND Zipcode in [10001, 20000]
• 2 patients have contracted pneumonia
• 2 out of 4 patients satisfies the query condition on Age and
Zipcode
• Estimated answer for query A: 2 * 2 / 4 = 1, which is also the
actual result from the original microdata
Age Sex Zipcode Group-ID
23 M 11000 1
27 M 13000 1
35 M 59000 1
59 M 12000 1
t1
t2
t3
t4
Preserving Correlation
• Let us first examine the correlation between Age and
Disease in our running example
• Each tuple in the microdata can be mapped to a point in
the (Age, Disease) domain
• The above tuple can be mapped to (23, pneumonia).
Age Sex Zipcode Disease
23 M 11000 pneumonia
.... … … …
t1
Preserving Correlation (cont.)
• We model this tuple using a probability density function
(pdf):
Preserving Correlation (cont.)
Anatomize
• An algorithm for computing anatomized
tables that
– runs in I/O cost linear to the cardinality n of
the microdata table
– minimizes the RCE when n is a multiple of l,
otherwise achieves an RCE that is higher
than the lower-bound by a factor of at most
1 + 1/n
Accuracy of Data Analysis
Summary
• Anatomy outperforms generalization by allowing
much more accurate data analysis on the
published data.
• Anatomized tables (with nearly optimal quality
guarantee) can be computed in I/O cost linear to
the database cardinality.
Anatomy: Simple and Effective Privacy Preservation

More Related Content

Similar to Anatomy: Simple and Effective Privacy Preservation

Diagnostic Journeys of Patients Evaluated for Lyme Disease and Given Extended...
Diagnostic Journeys of Patients Evaluated for Lyme Disease and Given Extended...Diagnostic Journeys of Patients Evaluated for Lyme Disease and Given Extended...
Diagnostic Journeys of Patients Evaluated for Lyme Disease and Given Extended...YiJu Tseng
 
Copy of osce cme24.04.2012 calicut
Copy of osce cme24.04.2012 calicut Copy of osce cme24.04.2012 calicut
Copy of osce cme24.04.2012 calicut Sanjeev Medehal
 
Drs. Escobar, Pikus, and Blackwell’s CMC X-Ray Mastery Project: April Cases
Drs. Escobar, Pikus, and Blackwell’s CMC X-Ray Mastery Project: April CasesDrs. Escobar, Pikus, and Blackwell’s CMC X-Ray Mastery Project: April Cases
Drs. Escobar, Pikus, and Blackwell’s CMC X-Ray Mastery Project: April CasesSean M. Fox
 
Diabetes care in the time of Covid 19 2021 Prof Vinod Patel
Diabetes care in the time of Covid 19 2021 Prof Vinod PatelDiabetes care in the time of Covid 19 2021 Prof Vinod Patel
Diabetes care in the time of Covid 19 2021 Prof Vinod PatelVinod0901
 
International classification of disease
International classification of diseaseInternational classification of disease
International classification of diseaseGAMANDEEP
 
Epidemiology Lectures for UG
Epidemiology Lectures for UGEpidemiology Lectures for UG
Epidemiology Lectures for UGamitakashyap1
 
Epidemiology
EpidemiologyEpidemiology
Epidemiologysobana M
 
Role of Stem Cell Transplantation in the Treatment of Ulcerative Colitis
Role of Stem Cell Transplantation in the Treatment of Ulcerative ColitisRole of Stem Cell Transplantation in the Treatment of Ulcerative Colitis
Role of Stem Cell Transplantation in the Treatment of Ulcerative ColitisMohammed Fathy Zaky
 
Métodos de investigación en epidemiología ambiental
Métodos de investigación en epidemiología ambientalMétodos de investigación en epidemiología ambiental
Métodos de investigación en epidemiología ambientalSalud en todas
 
The mystery of lyme disease
The mystery of lyme diseaseThe mystery of lyme disease
The mystery of lyme diseaseYiJu Tseng
 
LapJag Pneumoni aspirasi.pptx
LapJag Pneumoni aspirasi.pptxLapJag Pneumoni aspirasi.pptx
LapJag Pneumoni aspirasi.pptxMichaelJosia2
 
uptodate on acute kidney injury
uptodate on acute kidney injuryuptodate on acute kidney injury
uptodate on acute kidney injurySherif Mohammed
 
Liver Abscess ppt.pptx
Liver Abscess ppt.pptxLiver Abscess ppt.pptx
Liver Abscess ppt.pptxDrKalpitThakor
 
Drs. Lorenzen and Barlock’s CMC X-Ray Mastery Project: March Cases
Drs. Lorenzen and Barlock’s CMC X-Ray Mastery Project: March CasesDrs. Lorenzen and Barlock’s CMC X-Ray Mastery Project: March Cases
Drs. Lorenzen and Barlock’s CMC X-Ray Mastery Project: March CasesSean M. Fox
 
Principles and Methods of Epidemiologic Study
Principles and Methods of Epidemiologic StudyPrinciples and Methods of Epidemiologic Study
Principles and Methods of Epidemiologic StudyDugoGadisa
 
Drs. Milam and Thomas's CMC X-Ray Mastery Project: July cases
Drs. Milam and Thomas's CMC X-Ray Mastery Project: July casesDrs. Milam and Thomas's CMC X-Ray Mastery Project: July cases
Drs. Milam and Thomas's CMC X-Ray Mastery Project: July casesSean M. Fox
 

Similar to Anatomy: Simple and Effective Privacy Preservation (20)

Diagnostic Journeys of Patients Evaluated for Lyme Disease and Given Extended...
Diagnostic Journeys of Patients Evaluated for Lyme Disease and Given Extended...Diagnostic Journeys of Patients Evaluated for Lyme Disease and Given Extended...
Diagnostic Journeys of Patients Evaluated for Lyme Disease and Given Extended...
 
file 1 siloamlv
file 1 siloamlvfile 1 siloamlv
file 1 siloamlv
 
Copy of osce cme24.04.2012 calicut
Copy of osce cme24.04.2012 calicut Copy of osce cme24.04.2012 calicut
Copy of osce cme24.04.2012 calicut
 
Drs. Escobar, Pikus, and Blackwell’s CMC X-Ray Mastery Project: April Cases
Drs. Escobar, Pikus, and Blackwell’s CMC X-Ray Mastery Project: April CasesDrs. Escobar, Pikus, and Blackwell’s CMC X-Ray Mastery Project: April Cases
Drs. Escobar, Pikus, and Blackwell’s CMC X-Ray Mastery Project: April Cases
 
Diabetes care in the time of Covid 19 2021 Prof Vinod Patel
Diabetes care in the time of Covid 19 2021 Prof Vinod PatelDiabetes care in the time of Covid 19 2021 Prof Vinod Patel
Diabetes care in the time of Covid 19 2021 Prof Vinod Patel
 
Sepsis seminar final
Sepsis seminar   finalSepsis seminar   final
Sepsis seminar final
 
International classification of disease
International classification of diseaseInternational classification of disease
International classification of disease
 
Epidemiology Lectures for UG
Epidemiology Lectures for UGEpidemiology Lectures for UG
Epidemiology Lectures for UG
 
Epidemiology
EpidemiologyEpidemiology
Epidemiology
 
Role of Stem Cell Transplantation in the Treatment of Ulcerative Colitis
Role of Stem Cell Transplantation in the Treatment of Ulcerative ColitisRole of Stem Cell Transplantation in the Treatment of Ulcerative Colitis
Role of Stem Cell Transplantation in the Treatment of Ulcerative Colitis
 
Métodos de investigación en epidemiología ambiental
Métodos de investigación en epidemiología ambientalMétodos de investigación en epidemiología ambiental
Métodos de investigación en epidemiología ambiental
 
The mystery of lyme disease
The mystery of lyme diseaseThe mystery of lyme disease
The mystery of lyme disease
 
LapJag Pneumoni aspirasi.pptx
LapJag Pneumoni aspirasi.pptxLapJag Pneumoni aspirasi.pptx
LapJag Pneumoni aspirasi.pptx
 
uptodate on acute kidney injury
uptodate on acute kidney injuryuptodate on acute kidney injury
uptodate on acute kidney injury
 
"Biomarkers in sepsis and septic shock" by Prof. Jérôme Pugin
"Biomarkers in sepsis and septic shock" by Prof. Jérôme Pugin"Biomarkers in sepsis and septic shock" by Prof. Jérôme Pugin
"Biomarkers in sepsis and septic shock" by Prof. Jérôme Pugin
 
Liver Abscess ppt.pptx
Liver Abscess ppt.pptxLiver Abscess ppt.pptx
Liver Abscess ppt.pptx
 
Drs. Lorenzen and Barlock’s CMC X-Ray Mastery Project: March Cases
Drs. Lorenzen and Barlock’s CMC X-Ray Mastery Project: March CasesDrs. Lorenzen and Barlock’s CMC X-Ray Mastery Project: March Cases
Drs. Lorenzen and Barlock’s CMC X-Ray Mastery Project: March Cases
 
Ibd ppt
Ibd ppt Ibd ppt
Ibd ppt
 
Principles and Methods of Epidemiologic Study
Principles and Methods of Epidemiologic StudyPrinciples and Methods of Epidemiologic Study
Principles and Methods of Epidemiologic Study
 
Drs. Milam and Thomas's CMC X-Ray Mastery Project: July cases
Drs. Milam and Thomas's CMC X-Ray Mastery Project: July casesDrs. Milam and Thomas's CMC X-Ray Mastery Project: July cases
Drs. Milam and Thomas's CMC X-Ray Mastery Project: July cases
 

Recently uploaded

Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfDr Vijay Vishwakarma
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxmarlenawright1
 
MuleSoft Integration with AWS Textract | Calling AWS Textract API |AWS - Clou...
MuleSoft Integration with AWS Textract | Calling AWS Textract API |AWS - Clou...MuleSoft Integration with AWS Textract | Calling AWS Textract API |AWS - Clou...
MuleSoft Integration with AWS Textract | Calling AWS Textract API |AWS - Clou...MysoreMuleSoftMeetup
 
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...Nguyen Thanh Tu Collection
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
Model Attribute _rec_name in the Odoo 17
Model Attribute _rec_name in the Odoo 17Model Attribute _rec_name in the Odoo 17
Model Attribute _rec_name in the Odoo 17Celine George
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17Celine George
 
Orientation Canvas Course Presentation.pdf
Orientation Canvas Course Presentation.pdfOrientation Canvas Course Presentation.pdf
Orientation Canvas Course Presentation.pdfElizabeth Walsh
 
Details on CBSE Compartment Exam.pptx1111
Details on CBSE Compartment Exam.pptx1111Details on CBSE Compartment Exam.pptx1111
Details on CBSE Compartment Exam.pptx1111GangaMaiya1
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - Englishneillewis46
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxCeline George
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lessonQUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lessonhttgc7rh9c
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxJisc
 
Observing-Correct-Grammar-in-Making-Definitions.pptx
Observing-Correct-Grammar-in-Making-Definitions.pptxObserving-Correct-Grammar-in-Making-Definitions.pptx
Observing-Correct-Grammar-in-Making-Definitions.pptxAdelaideRefugio
 
Simple, Complex, and Compound Sentences Exercises.pdf
Simple, Complex, and Compound Sentences Exercises.pdfSimple, Complex, and Compound Sentences Exercises.pdf
Simple, Complex, and Compound Sentences Exercises.pdfstareducators107
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 

Recently uploaded (20)

Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
MuleSoft Integration with AWS Textract | Calling AWS Textract API |AWS - Clou...
MuleSoft Integration with AWS Textract | Calling AWS Textract API |AWS - Clou...MuleSoft Integration with AWS Textract | Calling AWS Textract API |AWS - Clou...
MuleSoft Integration with AWS Textract | Calling AWS Textract API |AWS - Clou...
 
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Model Attribute _rec_name in the Odoo 17
Model Attribute _rec_name in the Odoo 17Model Attribute _rec_name in the Odoo 17
Model Attribute _rec_name in the Odoo 17
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
OS-operating systems- ch05 (CPU Scheduling) ...
OS-operating systems- ch05 (CPU Scheduling) ...OS-operating systems- ch05 (CPU Scheduling) ...
OS-operating systems- ch05 (CPU Scheduling) ...
 
Orientation Canvas Course Presentation.pdf
Orientation Canvas Course Presentation.pdfOrientation Canvas Course Presentation.pdf
Orientation Canvas Course Presentation.pdf
 
Details on CBSE Compartment Exam.pptx1111
Details on CBSE Compartment Exam.pptx1111Details on CBSE Compartment Exam.pptx1111
Details on CBSE Compartment Exam.pptx1111
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lessonQUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
Observing-Correct-Grammar-in-Making-Definitions.pptx
Observing-Correct-Grammar-in-Making-Definitions.pptxObserving-Correct-Grammar-in-Making-Definitions.pptx
Observing-Correct-Grammar-in-Making-Definitions.pptx
 
Simple, Complex, and Compound Sentences Exercises.pdf
Simple, Complex, and Compound Sentences Exercises.pdfSimple, Complex, and Compound Sentences Exercises.pdf
Simple, Complex, and Compound Sentences Exercises.pdf
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 

Anatomy: Simple and Effective Privacy Preservation

  • 2. Privacy preserving data publishing Microdata • Purposes: – Allow researchers to effectively study the correlation between various attributes – Protect the privacy of every patient Name Age Sex Zipcode Disease Bob 23 M 11000 pneumonia Ken 27 M 13000 dyspepsia Peter 35 M 59000 dyspepsia Sam 59 M 12000 pneumonia Jane 61 F 54000 flu Linda 65 F 25000 gastritis Alice 65 F 25000 flu Mandy 70 F 30000 bronchitis
  • 3. A naïve solution • It does not work. See next. publish Name Age Sex Zipcode Disease Bob 23 M 11000 pneumonia Ken 27 M 13000 dyspepsia Peter 35 M 59000 dyspepsia Sam 59 M 12000 pneumonia Jane 61 F 54000 flu Linda 65 F 25000 gastritis Alice 65 F 25000 flu Mandy 70 F 30000 bronchitis Age Sex Zipcode Disease 23 M 11000 pneumonia 27 M 13000 dyspepsia 35 M 59000 dyspepsia 59 M 12000 pneumonia 61 F 54000 flu 65 F 25000 gastritis 65 F 25000 flu 70 F 30000 bronchitis
  • 4. Generalization A generalized table Age Sex Zipcode Disease [21, 60] M [10001, 60000] pneumonia [21, 60] M [10001, 60000] dyspepsia [21, 60] M [10001, 60000] dyspepsia [21, 60] M [10001, 60000] pneumonia [61, 70] F [10001, 60000] flu [61, 70] F [10001, 60000] gastritis [61, 70] F [10001, 60000] flu [61, 70] F [10001, 60000] bronchitis Name Age Sex Zipcode Bob 23 M 11000 • Transform each QI value into a less specific form How much generalization do we need?
  • 5. l-diversity • A QI-group with m tuples is l-diverse, iff each sensitive value appears no more than m / l times in the QI-group. • A table is l-diverse, iff all of its QI-groups are l-diverse. • The above table is 2-diverse. 2 QI-groups Quasi-identifier (QI) attributes Sensitive attribute Age Sex Zipcode Disease [21, 60] M [10001, 60000] pneumonia [21, 60] M [10001, 60000] dyspepsia [21, 60] M [10001, 60000] dyspepsia [21, 60] M [10001, 60000] pneumonia [61, 70] F [10001, 60000] flu [61, 70] F [10001, 60000] gastritis [61, 70] F [10001, 60000] flu [61, 70] F [10001, 60000] bronchitis
  • 6. What l-diversity guarantees • From an l-diverse generalized table, an adversary (without any prior knowledge) can infer the sensitive value of each individual with confidence at most 1/l Age Sex Zipcode Disease [21, 60] M [10001, 60000] pneumonia [21, 60] M [10001, 60000] dyspepsia [21, 60] M [10001, 60000] dyspepsia [21, 60] M [10001, 60000] pneumonia [61, 70] F [10001, 60000] flu [61, 70] F [10001, 60000] gastritis [61, 70] F [10001, 60000] flu [61, 70] F [10001, 60000] bronchitis Name Age Sex Zipcode Bob 23 M 11000 A 2-diverse generalized table
  • 7. Defect of generalization • Query A: SELECT COUNT(*) from Unknown-Microdata WHERE Disease = ‘pneumonia’ AND Age in [0, 30] AND Zipcode in [10001, 20000] Age Sex Zipcode Disease [21, 60] M [10001, 60000] pneumonia [21, 60] M [10001, 60000] dyspepsia [21, 60] M [10001, 60000] dyspepsia [21, 60] M [10001, 60000] pneumonia [61, 70] F [10001, 60000] flu [61, 70] F [10001, 60000] gastritis [61, 70] F [10001, 60000] flu [61, 70] F [10001, 60000] bronchitis • Estimated answer: 2 * p, where p is the probability that each of the two tuples satisfies the query conditions
  • 8. Defect of generalization (cont.) • Query A: SELECT COUNT(*) from Unknown-Microdata WHERE Disease = ‘pneumonia’ AND Age in [0, 30] AND Zipcode in [10001, 20000] • p = Area( R1 ∩ Q) / Area( R1 ) = 0.05 • Estimated answer for query A: 2 * p = 0.1 Age Sex Zipcode Disease [21, 60] M [10001, 60000] pneumonia [21, 60] M [10001, 60000] pneumonia
  • 9. Defect of generalization (cont.) • Query A: SELECT COUNT(*) from Unknown-Microdata WHERE Disease = ‘pneumonia’ AND Age in [0, 30] AND Zipcode in [10001, 20000] • Estimated answer from the generalized table: 0.1 Name Age Sex Zipcode Disease Bob 23 M 11000 pneumonia Ken 27 M 13000 dyspepsia Peter 35 M 59000 dyspepsia Sam 59 M 12000 pneumonia Jane 61 F 54000 flu Linda 65 F 25000 gastritis Alice 65 F 25000 flu Mandy 70 F 30000 bronchitis • The exact answer should be: 1
  • 10. Contributions 1. We propose an alternative technique for generalization called Anatomy, which allows much more accurate data analysis while still preserving privacy. 2. We develop an algorithm for computing anatomized tables that • runs in linear I/Os • (nearly) minimizes information loss
  • 11. Outline • Basic Idea of Anatomy • Preserving Correlation • Algorithm for Anatomy • Experimental Results
  • 12. Basic Idea of Anatomy • For a given microdata table, Anatomy releases a quasi- identifier table (QIT) and a sensitive table (ST) Group-ID Disease Count 1 dyspepsia 2 1 pneumonia 2 2 bronchitis 1 2 flu 2 2 gastritis 1 Age Sex Zipcode Group-ID 23 M 11000 1 27 M 13000 1 35 M 59000 1 59 M 12000 1 61 F 54000 2 65 F 25000 2 65 F 25000 2 70 F 30000 2 Quasi-identifier Table (QIT) Sensitive Table (ST) Age Sex Zipcode Disease 23 M 11000 pneumonia 27 M 13000 dyspepsia 35 M 59000 dyspepsia 59 M 12000 pneumonia 61 F 54000 flu 65 F 25000 gastritis 65 F 25000 flu 70 F 30000 bronchitis microdata
  • 13. Basic Idea of Anatomy (cont.) 1. Select a partition of the tuples Age Sex Zipcode Disease 23 M 11000 pneumonia 27 M 13000 dyspepsia 35 M 59000 dyspepsia 59 M 12000 pneumonia 61 F 54000 flu 65 F 25000 gastritis 65 F 25000 flu 70 F 30000 bronchitis QI group 1 QI group 2 a 2-diverse partition
  • 14. Basic Idea of Anatomy (cont.) 2. Generate a quasi-idnetifier table (QIT) and a sensitive table (ST) based on the selected partition Disease pneumonia dyspepsia dyspepsia pneumonia flu gastritis flu bronchitis Age Sex Zipcode 23 M 11000 27 M 13000 35 M 59000 59 M 12000 61 F 54000 65 F 25000 65 F 25000 70 F 30000 group 1 group 2 quasi-identifier table (QIT) sensitive table (ST)
  • 15. Basic Idea of Anatomy (cont.) 2. Generate a quasi-idnetifier table (QIT) and a sensitive table (ST) based on the selected partition Group-ID Disease 1 pneumonia 1 dyspepsia 1 dyspepsia 1 pneumonia 2 flu 2 gastritis 2 flu 2 bronchitis Age Sex Zipcode Group-ID 23 M 11000 1 27 M 13000 1 35 M 59000 1 59 M 12000 1 61 F 54000 2 65 F 25000 2 65 F 25000 2 70 F 30000 2 quasi-identifier table (QIT) sensitive table (ST)
  • 16. Basic Idea of Anatomy (cont.) 2. Generate a quasi-idnetifier table (QIT) and a sensitive table (ST) based on the selected partition Group-ID Disease Count 1 dyspepsia 2 1 pneumonia 2 2 bronchitis 1 2 flu 2 2 gastritis 1 Age Sex Zipcode Group-ID 23 M 11000 1 27 M 13000 1 35 M 59000 1 59 M 12000 1 61 F 54000 2 65 F 25000 2 65 F 25000 2 70 F 30000 2 quasi-identifier table (QIT) sensitive table (ST)
  • 17. Privacy Preservation • From a pair of QIT and ST generated from an l-diverse partition, the adversary can infer the sensitive value of each individual with confidence at most 1/l Group-ID Disease Count 1 dyspepsia 2 1 pneumonia 2 2 bronchitis 1 2 flu 2 2 gastritis 1 Age Sex Zipcode Group-ID 23 M 11000 1 27 M 13000 1 35 M 59000 1 59 M 12000 1 61 F 54000 2 65 F 25000 2 65 F 25000 2 70 F 30000 2 quasi-identifier table (QIT) sensitive table (ST) Name Age Sex Zipcode Bob 23 M 11000
  • 18. Accuracy of Data Analysis • Query A: SELECT COUNT(*) from Unknown-Microdata WHERE Disease = ‘pneumonia’ AND Age in [0, 30] AND Zipcode in [10001, 20000] Group-ID Disease Count 1 dyspepsia 2 1 pneumonia 2 2 bronchitis 1 2 flu 2 2 gastritis 1 Age Sex Zipcode Group-ID 23 M 11000 1 27 M 13000 1 35 M 59000 1 59 M 12000 1 61 F 54000 2 65 F 25000 2 65 F 25000 2 70 F 30000 2 quasi-identifier table (QIT) sensitive table (ST)
  • 19. Accuracy of Data Analysis (cont.) • Query A: SELECT COUNT(*) from Unknown-Microdata WHERE Disease = ‘pneumonia’ AND Age in [0, 30] AND Zipcode in [10001, 20000] • 2 patients have contracted pneumonia • 2 out of 4 patients satisfies the query condition on Age and Zipcode • Estimated answer for query A: 2 * 2 / 4 = 1, which is also the actual result from the original microdata Age Sex Zipcode Group-ID 23 M 11000 1 27 M 13000 1 35 M 59000 1 59 M 12000 1 t1 t2 t3 t4
  • 20. Preserving Correlation • Let us first examine the correlation between Age and Disease in our running example • Each tuple in the microdata can be mapped to a point in the (Age, Disease) domain • The above tuple can be mapped to (23, pneumonia). Age Sex Zipcode Disease 23 M 11000 pneumonia .... … … … t1
  • 21. Preserving Correlation (cont.) • We model this tuple using a probability density function (pdf):
  • 23. Anatomize • An algorithm for computing anatomized tables that – runs in I/O cost linear to the cardinality n of the microdata table – minimizes the RCE when n is a multiple of l, otherwise achieves an RCE that is higher than the lower-bound by a factor of at most 1 + 1/n
  • 24. Accuracy of Data Analysis
  • 25. Summary • Anatomy outperforms generalization by allowing much more accurate data analysis on the published data. • Anatomized tables (with nearly optimal quality guarantee) can be computed in I/O cost linear to the database cardinality.