SlideShare a Scribd company logo
Anonymizing Health Data
Webcast
Case Studies and Methods to Get You S
Khaled El Emam & Luk
Anonymizing Health Data
Part 1 of Webcast: Intro and Methodology
Part 2 of Webcast: A Look at Our Case Studies
Part 3 of Webcast: Questions and Answers
Khaled El Emam & Luk
Anonymizing Health Data
Part 1 of Webcast: Intro and Methodology
Khaled El Emam & Luk
Anonymizing Health Data
To Anonymize or not to Anonymize
Khaled El Emam & Luk
Anonymizing Health Data
Consent needs to be informed.
To Anonymize or not to Anonymize
Khaled El Emam & Luk
Anonymizing Health Data
Consent needs to be informed.
Not all health care providers are willing to
share their patient’s PHI.
To Anonymize or not to Anonymize
Khaled El Emam & Luk
Anonymizing Health Data
Consent needs to be informed.
Not all health care providers are willing to
share their patient’s PHI.
Anonymization allows for the sharing of health information.
To Anonymize or not to Anonymize
Khaled El Emam & Luk
Anonymizing Health Data
Consent needs to be informed.
Not all health care providers are willing to
share their patient’s PHI.
Anonymization allows for the sharing of health information.
To Anonymize or not to Anonymize
Compelling financial case. Breach cost ~$200 per patient.
Khaled El Emam & Luk
Anonymizing Health Data
Consent needs to be informed.
Not all health care providers are willing to
share their patient’s PHI.
Anonymization allows for the sharing of health information.
To Anonymize or not to Anonymize
Compelling financial case. Breach cost ~$200 per patient.
Khaled El Emam & Luk
Anonymizing Health Data
Consent needs to be informed.
Not all health care providers are willing to
share their patient’s PHI.
Anonymization allows for the sharing of health information.
To Anonymize or not to Anonymize
Privacy protective behaviors by patients.
Compelling financial case. Breach cost ~$200 per patient.
Khaled El Emam & Luk
Anonymizing Health Data
Masking Standards
Khaled El Emam & Luk
Anonymizing Health Data
Masking Standards
First name, last name, SSN.
Khaled El Emam & Luk
Anonymizing Health Data
Masking Standards
Distortion of data—no analytics.
First name, last name, SSN.
Khaled El Emam & Luk
Anonymizing Health Data
Masking Standards
Creating pseudonyms.
First name, last name, SSN.
Distortion of data—no analytics.
Khaled El Emam & Luk
Anonymizing Health Data
Masking Standards
Removing a whole field.
Creating pseudonyms.
First name, last name, SSN.
Distortion of data—no analytics.
Khaled El Emam & Luk
Anonymizing Health Data
Masking Standards
Removing a whole field.
Creating pseudonyms.
Replacing actual values with random ones.
First name, last name, SSN.
Distortion of data—no analytics.
Khaled El Emam & Luk
Anonymizing Health Data
De-identification Standards
Khaled El Emam & Luk
Anonymizing Health Data
De-identification Standards
Age, sex, race, address, income.
Khaled El Emam & Luk
Anonymizing Health Data
Minimal distortion of data—for analytics.
Age, sex, race, address, income.
De-identification Standards
Khaled El Emam & Luk
Anonymizing Health Data
Minimal distortion of data—for analytics.
Age, sex, race, address, income.
De-identification Standards
Safe Harbor in HIPAA Privacy Rule.
Khaled El Emam & Luk
Anonymizing Health Data
What’s “Actual Knowledge”?
Privacy Rule
Safe Harbor
Khaled El Emam & Luk
Anonymizing Health Data
What’s “Actual Knowledge”?
Info, alone or in combo, that could identify
an individual.
Khaled El Emam & Luk
Anonymizing Health Data
What’s “Actual Knowledge”?
Info, alone or in combo, that could identify
an individual.
Has to be specific to the data set—not
theoretical.
Khaled El Emam & Luk
Anonymizing Health Data
What’s “Actual Knowledge”?
Info, alone or in combo, that could identify
an individual.
Has to be specific to the data set—not
theoretical.
Occupation Mayor of Gotham.
Khaled El Emam & Luk
Anonymizing Health Data
Heuristics, or rules of thumb.
Minimal distortion of data—for analytics.
Age, sex, race, address, income.
Safe Harbor in HIPAA Privacy Rule.
De-identification Standards
Khaled El Emam & Luk
Anonymizing Health Data
Heuristics, or rules of thumb.
Statistical method in HIPAA Privacy Rule.
Minimal distortion of data—for analytics.
Age, sex, race, address, income.
Safe Harbor in HIPAA Privacy Rule.
De-identification Standards
Khaled El Emam & Luk
Anonymizing Health Data
De-identification Myths
Khaled El Emam & Luk
Anonymizing Health Data
De-identification Myths
Myth: It’s possible to re-identify most, if not
all, data.
Khaled El Emam & Luk
Anonymizing Health Data
De-identification Myths
Myth: It’s possible to re-identify most, if not
all, data.
Using robust methods, evidence suggests risk
can be very small.
Khaled El Emam & Luk
Anonymizing Health Data
De-identification Myths
Myth: It’s possible to re-identify most, if not
all, data.
Myth: Genomic sequences are not
identifiable, or are easy to re-identify.
Using robust methods, evidence suggests risk
can be very small.
Khaled El Emam & Luk
Anonymizing Health Data
De-identification Myths
Myth: It’s possible to re-identify most, if not
all, data.
Myth: Genomic sequences are not
identifiable, or are easy to re-identify.
In some cases can re-identify, difficult to de-
identify using our methods.
Using robust methods, evidence suggests risk
can be very small.
Khaled El Emam & Luk
Anonymizing Health Data
A Risk-based De-identification Methodology
Khaled El Emam & Luk
Anonymizing Health Data
A Risk-based De-identification Methodology
The risk of re-identification can be quantified.
Khaled El Emam & Luk
Anonymizing Health Data
A Risk-based De-identification Methodology
The risk of re-identification can be quantified.
The Goldilocks principle:
balancing privacy with data utility.
Khaled El Emam & Luk
Anonymizing Health Data
Khaled El Emam & Luk
Anonymizing Health Data
A Risk-based De-identification Methodology
The risk of re-identification can be quantified.
The Goldilocks principle:
balancing privacy with data utility.
The re-identification risk needs to be very small.
Khaled El Emam & Luk
Anonymizing Health Data
A Risk-based De-identification Methodology
The risk of re-identification can be quantified.
The Goldilocks principle:
balancing privacy with data utility.
De-identification involves a mix of technical, contractual,
and other measures.
The re-identification risk needs to be very small.
Khaled El Emam & Luk
Anonymizing Health Data
Steps in the De-identification Methodology
Step 1: Select Direct and Indirect Identifiers
Step 2: Setting the Threshold
Step 3: Examining Plausible Attacks
Step 4: De-identifying the Data
Step 5: Documenting the Process
Khaled El Emam & Luk
Anonymizing Health Data
Step 1: Select Direct and Indirect Identifiers
Khaled El Emam & Luk
Anonymizing Health Data
Direct identifiers: name, telephone number, health
insurance card number, medical record number.
Step 1: Select Direct and Indirect Identifiers
Khaled El Emam & Luk
Anonymizing Health Data
Direct identifiers: name, telephone number, health
insurance card number, medical record number.
Indirect identifiers, or quasi-identifiers: sex, date of birth,
ethnicity, locations, event dates, medical codes.
Step 1: Select Direct and Indirect Identifiers
Khaled El Emam & Luk
Anonymizing Health Data
Step 2: Setting the Threshold
Khaled El Emam & Luk
Anonymizing Health Data
Maximum acceptable risk for sharing data.
Step 2: Setting the Threshold
Khaled El Emam & Luk
Anonymizing Health Data
Maximum acceptable risk for sharing data.
Needs to be quantitative and defensible.
Step 2: Setting the Threshold
Khaled El Emam & Luk
Anonymizing Health Data
Maximum acceptable risk for sharing data.
Needs to be quantitative and defensible.
Is the data in going to be in the public domain?
Step 2: Setting the Threshold
Khaled El Emam & Luk
Anonymizing Health Data
Maximum acceptable risk for sharing data.
Needs to be quantitative and defensible.
Is the data in going to be in the public domain?
Extent of invasion-of-privacy when data was shared?
Step 2: Setting the Threshold
Khaled El Emam & Luk
Anonymizing Health Data
Step 3: Examining Plausible Attacks
Khaled El Emam & Luk
Anonymizing Health Data
Recipient deliberately attempts to re-identify the data.
Step 3: Examining Plausible Attacks
Khaled El Emam & Luk
Anonymizing Health Data
Recipient deliberately attempts to re-identify the data.
Recipient inadvertently re-identifies the data.
“Holly Smokes, I know her!”
Step 3: Examining Plausible Attacks
Khaled El Emam & Luk
Anonymizing Health Data
Recipient deliberately attempts to re-identify the data.
Recipient inadvertently re-identifies the data.
Data breach at recipient’s site, “data gone wild”.
Step 3: Examining Plausible Attacks
Khaled El Emam & Luk
Anonymizing Health Data
Recipient deliberately attempts to re-identify the data.
Data breach at recipient’s site, “data gone wild”.
Adversary launches a demonstration attack on the data.
Step 3: Examining Plausible Attacks
Khaled El Emam & Luk
Recipient inadvertently re-identifies the data.
Anonymizing Health Data
Step 4: De-identifying the Data
Khaled El Emam & Luk
Anonymizing Health Data
Step 4: De-identifying the Data
Generalization: reducing the precision of a field.
Dates converted to month/year, or year.
Khaled El Emam & Luk
Anonymizing Health Data
Step 4: De-identifying the Data
Generalization: reducing the precision of a field.
Suppression: replacing a cell with NULL.
Unique 55-year old female in birth registry.
Khaled El Emam & Luk
Anonymizing Health Data
Step 4: De-identifying the Data
Generalization: reducing the precision of a field.
Suppression: replacing a cell with NULL.
Sub-sampling: releasing a simple random sample.
50% of data set instead of all data.
Khaled El Emam & Luk
Anonymizing Health Data
Step 5: Documenting the Process
Khaled El Emam & Luk
Anonymizing Health Data
Step 5: Documenting the Process
Process documentation—a methodology text.
Khaled El Emam & Luk
Anonymizing Health Data
Step 5: Documenting the Process
Results documentation—data set, risk thresholds,
assumptions, evidence of low risk.
Khaled El Emam & Luk
Process documentation—a methodology text.
Anonymizing Health Data
Measuring Risk Under Plausible Attacks
Khaled El Emam & Luk
Anonymizing Health Data
T1:Deliberate Attempt
Measuring Risk Under Plausible Attacks
Pr(re-id, attempt) = Pr(attempt) Pr(re-id | attempt)
Khaled El Emam & Luk
Anonymizing Health Data
T1:Deliberate Attempt
Measuring Risk Under Plausible Attacks
Khaled El Emam & Luk
T2: Inadvertent Attempt (“Holly Smokes, I know her!”)
Pr(re-id, acquaintance) = Pr(acquaintance) Pr(re-id | acquaintance)
Anonymizing Health Data
T1:Deliberate Attempt
Measuring Risk Under Plausible Attacks
Khaled El Emam & Luk
T2: Inadvertent Attempt (“Holly Smokes, I know her!”)
T3: Data Breach (“data gone wild”)
Pr(re-id, breach) = Pr(breach) Pr(re-id | breach)
Anonymizing Health Data
T1:Deliberate Attempt
Measuring Risk Under Plausible Attacks
Khaled El Emam & Luk
T2: Inadvertent Attempt (“Holly Smokes, I know her!”)
T3: Data Breach (“data gone wild”)
T4: Public Data (demonstration attack)
Pr(re-id), based on data set only
Anonymizing Health Data
Choosing Thresholds
Khaled El Emam & Luk
Anonymizing Health Data
Choosing Thresholds
Khaled El Emam & Luk
Many precedents going back multiple decades.
Anonymizing Health Data
Choosing Thresholds
Khaled El Emam & Luk
Many precedents going back multiple decades.
Recommended by regulators.
Anonymizing Health Data
Choosing Thresholds
Khaled El Emam & Luk
Many precedents going back multiple decades.
Recommended by regulators.
All based on max risk though.
Anonymizing Health Data
Choosing Thresholds
Khaled El Emam & Luk
Many precedents going back multiple decades.
Recommended by regulators.
All based on max risk though.
Anonymizing Health Data
Part 2 of Webcast: A Look at Our Case Studies
Khaled El Emam & Luk
Anonymizing Health Data
Cross Sectional Data: Research Registries
Khaled El Emam & Luk
Anonymizing Health Data
Cross Sectional Data: Research Registries
Khaled El Emam & Luk
Better Outcomes Registry & Network (BORN)
of Ontario
Anonymizing Health Data
Cross Sectional Data: Research Registries
Khaled El Emam & Luk
Better Outcomes Registry & Network (BORN)
of Ontario
140,000 births per year.
Anonymizing Health Data
Cross Sectional Data: Research Registries
Khaled El Emam & Luk
Better Outcomes Registry & Network (BORN)
of Ontario
140,000 births per year.
Cross-sectional—mothers not traced over time.
Anonymizing Health Data
Cross Sectional Data: Research Registries
Khaled El Emam & Luk
Better Outcomes Registry & Network (BORN)
of Ontario
140,000 births per year.
Cross-sectional—mothers not traced over time.
Process of getting de-identified data from a
research registry.
Anonymizing Health Data
Cross Sectional Data: Research Registries
Khaled El Emam & Luk
Better Outcomes Registry & Network (BORN)
of Ontario
140,000 births per year.
Cross-sectional—mothers not traced over time.
Process of getting de-identified data from a
research registry.
Anonymizing Health Data
Researcher Ronnie wants data!
Khaled El Emam & Luk
Anonymizing Health Data
Researcher Ronnie wants data!
Khaled El Emam & Luk
919,710 records
from 2005-2011
Anonymizing Health Data
Researcher Ronnie wants data!
Khaled El Emam & Luk
919,710 records
from 2005-2011
Anonymizing Health Data
Choosing Thresholds
Khaled El Emam & Luk
Anonymizing Health Data
Choosing Thresholds
Khaled El Emam & Luk
Average risk of 0.1 for Researcher Ronnie
(and the data he specifically requested).
Anonymizing Health Data
Choosing Thresholds
Khaled El Emam & Luk
0.05 if there were highly sensitive variables
(congenital anomalies, mental health problems).
Average risk of 0.1 for Researcher Ronnie
Anonymizing Health Data
Measuring Risk Under Plausible Attacks
Khaled El Emam & Luk
Anonymizing Health Data
T1:Deliberate Attempt
Measuring Risk Under Plausible Attacks
Khaled El Emam & Luk
Low motives and capacity
Anonymizing Health Data
T1:Deliberate Attempt
Measuring Risk Under Plausible Attacks
Khaled El Emam & Luk
Low motives and capacity; low mitigating controls.
Anonymizing Health Data
T1:Deliberate Attempt
Measuring Risk Under Plausible Attacks
Khaled El Emam & Luk
Pr(attempt) = 0.4
Anonymizing Health Data
T1:Deliberate Attempt
Measuring Risk Under Plausible Attacks
Khaled El Emam & Luk
T2: Inadvertent Attempt (“Holly Smokes, I know her!”)
119,785 births out of a 4,478,500 women ( = 0.027)
Anonymizing Health Data
T1:Deliberate Attempt
Measuring Risk Under Plausible Attacks
Khaled El Emam & Luk
T2: Inadvertent Attempt (“Holly Smokes, I know her!”)
Pr(aquaintance) = 1- (1-0.027)150/2 = 0.87
Anonymizing Health Data
T1:Deliberate Attempt
Measuring Risk Under Plausible Attacks
Khaled El Emam & Luk
T2: Inadvertent Attempt (“Holly Smokes, I know her!”)
T3: Data Breach (“data gone wild”)
Based on historical data.
Anonymizing Health Data
T1:Deliberate Attempt
Measuring Risk Under Plausible Attacks
Khaled El Emam & Luk
T2: Inadvertent Attempt (“Holly Smokes, I know her!”)
T3: Data Breach (“data gone wild”)
Pr(breach)=0.27
Anonymizing Health Data
T1:Deliberate Attempt
Measuring Risk Under Plausible Attacks
Khaled El Emam & Luk
T2: Inadvertent Attempt (“Holly Smokes, I know her!”)
T3: Data Breach (“data gone wild”)
T4: Public Data (demonstration attack)
Anonymizing Health Data
T1:Deliberate Attempt
Measuring Risk Under Plausible Attacks
Khaled El Emam & Luk
T2: Inadvertent Attempt (“Holly Smokes, I know her!”)
T3: Data Breach (“data gone wild”)
Overall risk
Pr(re-id, T) = Pr(T) x Pr(re-id | T) ≤ 0.1
Anonymizing Health Data
Measuring Risk Under Plausible Attacks
Khaled El Emam & Luk
T2: Inadvertent Attempt (“Holly Smokes, I know her!”)
Pr(aquaintance) = 1- (1-0.027)150/2 = 0.87
Overall risk
Pr(re-id, acquaintance) = 0.87 Pr(re-id | acquaintance) ≤ 0.1
Anonymizing Health Data
De-identifying the Data Set
Khaled El Emam & Luk
Anonymizing Health Data
Meeting Thresholds: k-anonymity
Khaled El Emam & Luk
k
Anonymizing Health Data
Meeting Thresholds: k-anonymity
Khaled El Emam & Luk
Anonymizing Health Data
De-identifying the Data Set
Khaled El Emam & Luk
MDOB in 1-yy; BDOB in wk/yy; MPC of 1 char.
Anonymizing Health Data
De-identifying the Data Set
Khaled El Emam & Luk
MDOB in 1-yy; BDOB in wk/yy; MPC of 1 char.
MDOB in 10-yy; BDOB in qtr/yy; MPC of 3 chars.
Anonymizing Health Data
De-identifying the Data Set
Khaled El Emam & Luk
MDOB in 1-yy; BDOB in wk/yy; MPC of 1 char.
MDOB in 10-yy; BDOB in qtr/yy; MPC of 3 chars.
MDOB in 10-yy; BDOB in mm/yy; MPC of 3 chars.
Anonymizing Health Data
Year on Year: Re-using Risk Analyses
Khaled El Emam & Luk
Anonymizing Health Data
Year on Year: Re-using Risk Analyses
Khaled El Emam & Luk
In 2006 Researcher Ronnie asks for 2005.
Anonymizing Health Data
Year on Year: Re-using Risk Analyses
Khaled El Emam & Luk
In 2006 Researcher Ronnie asks for 2005—deleted.
In 2007 Researcher Ronnie asks for 2006.
Anonymizing Health Data
Year on Year: Re-using Risk Analyses
Khaled El Emam & Luk
In 2006 Researcher Ronnie asks for 2005.
In 2007 Researcher Ronnie asks for 2006—deleted.
In 2008 Researcher Ronnie asks for 2007.
Anonymizing Health Data
Year on Year: Re-using Risk Analyses
Khaled El Emam & Luk
In 2006 Researcher Ronnie asks for 2005.
In 2007 Researcher Ronnie asks for 2006.
In 2008 Researcher Ronnie asks for 2007—deleted.
In 2009 Researcher Ronnie asks for 2008.
Anonymizing Health Data
Year on Year: Re-using Risk Analyses
Khaled El Emam & Luk
In 2006 Researcher Ronnie asks for 2005.
In 2007 Researcher Ronnie asks for 2006.
In 2008 Researcher Ronnie asks for 2007.
In 2009 Researcher Ronnie asks for 2008—deleted.
In 2010 Researcher Ronnie asks for 2009.
Anonymizing Health Data
Year on Year: Re-using Risk Analyses
Khaled El Emam & Luk
In 2006 Researcher Ronnie asks for 2005.
In 2007 Researcher Ronnie asks for 2006.
In 2008 Researcher Ronnie asks for 2007.
In 2009 Researcher Ronnie asks for 2008—deleted.
In 2010 Researcher Ronnie asks for 2009.
Can we use the same de-identification scheme every year?
Anonymizing Health Data
Khaled El Emam & Luk
Anonymizing Health Data
Khaled El Emam & Luk
Anonymizing Health Data
Year on Year: Re-using Risk Analyses
Khaled El Emam & Luk
BORN data pertains to very stable populations.
Anonymizing Health Data
Year on Year: Re-using Risk Analyses
Khaled El Emam & Luk
BORN data pertains to very stable populations.
No dramatic changes in the number or characteristics of
births from 2005-2010.
Anonymizing Health Data
Year on Year: Re-using Risk Analyses
Khaled El Emam & Luk
BORN data pertains to very stable populations.
No dramatic changes in the number or characteristics of
births from 2005-2010.
Revisit de-identification scheme every 18 to 24 months.
Anonymizing Health Data
Year on Year: Re-using Risk Analyses
Khaled El Emam & Luk
BORN data pertains to very stable populations.
No dramatic changes in the number or characteristics of
births from 2005-2010.
Revisit de-identification scheme every 18 to 24 months.
Revisit if any new quasi-identifiers are added or changed.
Anonymizing Health Data
Longitudinal Discharge Abstract Data:
State Inpatient Databases
Khaled El Emam & Luk
Anonymizing Health Data
Longitudinal Discharge Abstract Data:
State Inpatient Databases
Khaled El Emam & Luk
Linking a patient’s records over time.
Anonymizing Health Data
Longitudinal Discharge Abstract Data:
State Inpatient Databases
Khaled El Emam & Luk
Linking a patient’s records over time.
Need to be de-identified differently.
Anonymizing Health Data
Meeting Thresholds: k-anonymity?
Khaled El Emam & Luk
k?
Anonymizing Health Data
Meeting Thresholds: k-anonymity?
Khaled El Emam & Luk
Anonymizing Health Data
Meeting Thresholds: k-anonymity?
Khaled El Emam & Luk
Anonymizing Health Data
De-identifying Under Complete Knowledge
Khaled El Emam & Luk
Anonymizing Health Data
De-identifying Under Complete Knowledge
Khaled El Emam & Luk
Anonymizing Health Data
De-identifying Under Complete Knowledge
Khaled El Emam & Luk
Anonymizing Health Data
De-identifying Under Complete Knowledge
Khaled El Emam & Luk
Anonymizing Health Data
State Inpatient Database (SID) of California
Khaled El Emam & Luk
Anonymizing Health Data
State Inpatient Database (SID) of California
Khaled El Emam & Luk
Researcher Ronnie wants public data!
Anonymizing Health Data
State Inpatient Database (SID) of California
Khaled El Emam & Luk
Researcher Ronnie wants public data!
Anonymizing Health Data
State Inpatient Database (SID) of California
Khaled El Emam & Luk
Anonymizing Health Data
Measuring Risk Under Plausible Attacks
Khaled El Emam & Luk
Anonymizing Health Data
T1:Deliberate Attempt
Measuring Risk Under Plausible Attacks
Khaled El Emam & Luk
T2: Inadvertent Attempt (“Holly Smokes, I know her!”)
T3: Data Breach (“data gone wild”)
T4: Public Data (demonstration attack)
Pr(re-id) ≤ 0.09 (maximum risk)
Anonymizing Health Data
De-identifying the Data Set
Khaled El Emam & Luk
Anonymizing Health Data
De-identifying the Data Set
Khaled El Emam & Luk
BirthYear in 5-yy (cut at 1910-);
AdmissionYear unchanged;
DaysSinceLastService in 28-dd (cut at 7-, 182+);
LengthOfStay same as DaysSinceLastService.
Anonymizing Health Data
De-identifying the Data Set
Khaled El Emam & Luk
BirthYear in 5-yy (cut at 1910-);
AdmissionYear unchanged;
DaysSinceLastService in 28-dd (cut at 7-, 182+);
LengthOfStay same as DaysSinceLastService.
Anonymizing Health Data
Connected Variables
Khaled El Emam & Luk
Anonymizing Health Data
Connected Variables
Khaled El Emam & Luk
QI to QI
Anonymizing Health Data
Connected Variables
Khaled El Emam & Luk
QI to QI
Similar QI?
Same generalization and suppression.
Anonymizing Health Data
Connected Variables
Khaled El Emam & Luk
QI to QI
Similar QI?
Same generalization and suppression.
QI to non-QI
Anonymizing Health Data
Connected Variables
Khaled El Emam & Luk
QI to QI
Similar QI?
Same generalization and suppression.
QI to non-QI
Non-QI is revealing?
Same suppression so both are removed.
Anonymizing Health Data
Other Issues Regarding Longitudinal Data
Khaled El Emam & Luk
Anonymizing Health Data
Other Issues Regarding Longitudinal Data
Khaled El Emam & Luk
Date shifting—maintaining order of records.
Anonymizing Health Data
Other Issues Regarding Longitudinal Data
Khaled El Emam & Luk
Date shifting—maintaining order of records.
Long tails—truncation of records.
Anonymizing Health Data
Other Issues Regarding Longitudinal Data
Khaled El Emam & Luk
Date shifting—maintaining order of records.
Long tails—truncation of records.
Adversary power—assumption of knowledge.
Anonymizing Health Data
Other Concerns to Think About
Khaled El Emam & Luk
Anonymizing Health Data
Other Concerns to Think About
Khaled El Emam & Luk
Free-form text—anonymization.
Anonymizing Health Data
Other Concerns to Think About
Khaled El Emam & Luk
Free-form text—anonymization.
Geospatial information—aggregation and
geoproxy risk.
Anonymizing Health Data
Other Concerns to Think About
Khaled El Emam & Luk
Free-form text—anonymization.
Geospatial information—aggregation and
geoproxy risk.
Medical codes—generalization, suppression,
shuffling (yes, as in cards).
Anonymizing Health Data
Other Concerns to Think About
Khaled El Emam & Luk
Free-form text—anonymization.
Geospatial information—aggregation and
geoproxy risk.
Medical codes—generalization, suppression,
shuffling (yes, as in cards).
Secure linking—linking data through
encryption before anonymization.
Anonymizing Health Data
Part 3 of Webcast: Questions and Answers
Khaled El Emam & Luk
Anonymizing Health Data
Khaled El Emam & Luk
More Comments or Questions: Contact us!
Anonymizing Health Data
Khaled El Emam & Luk
Khaled El Emam: kelemam@privacyanalytics.ca
Luk Arbuckle: larbuckle@privacyanalytics.ca
More Comments or Questions: Contact us!

More Related Content

What's hot

Data leakage detection
Data leakage detectionData leakage detection
Data leakage detection
Sankhadip Kundu
 
Investigating Using the Dark Web
Investigating Using the Dark WebInvestigating Using the Dark Web
Investigating Using the Dark Web
Case IQ
 
Privacy & Data Protection in the Digital World
Privacy & Data Protection in the Digital WorldPrivacy & Data Protection in the Digital World
Privacy & Data Protection in the Digital World
Arab Federation for Digital Economy
 
Corporate threat vector and landscape
Corporate threat vector and landscapeCorporate threat vector and landscape
Corporate threat vector and landscape
yohansurya2
 
osint - open source Intelligence
osint - open source Intelligenceosint - open source Intelligence
osint - open source Intelligence
Osama Ellahi
 
Data leakage detection
Data leakage detectionData leakage detection
Data leakage detection
kalpesh1908
 
PHISHING PROTECTION
PHISHING PROTECTIONPHISHING PROTECTION
PHISHING PROTECTION
Sylvain Martinez
 
Cyber threat intelligence ppt
Cyber threat intelligence pptCyber threat intelligence ppt
Cyber threat intelligence ppt
Kumar Gaurav
 
Social Engineering
Social EngineeringSocial Engineering
Social Engineering
primeteacher32
 
Data explosion
Data explosionData explosion
Data explosion
G Prachi
 
Data Privacy: What you need to know about privacy, from compliance to ethics
Data Privacy: What you need to know about privacy, from compliance to ethicsData Privacy: What you need to know about privacy, from compliance to ethics
Data Privacy: What you need to know about privacy, from compliance to ethics
AT Internet
 
Information security
Information securityInformation security
Information security
Mustahid Ali
 
Cyber security
Cyber securityCyber security
Cyber security
Bhavin Shah
 
Social Media Forensics for Investigators
Social Media Forensics for InvestigatorsSocial Media Forensics for Investigators
Social Media Forensics for Investigators
Case IQ
 
Data Security Explained
Data Security ExplainedData Security Explained
Data Security Explained
Happiest Minds Technologies
 
Cyber threat Intelligence and Incident Response by:-Sandeep Singh
Cyber threat Intelligence and Incident Response by:-Sandeep SinghCyber threat Intelligence and Incident Response by:-Sandeep Singh
Cyber threat Intelligence and Incident Response by:-Sandeep Singh
OWASP Delhi
 
Fundamental digital forensik
Fundamental digital forensikFundamental digital forensik
Fundamental digital forensik
newbie2019
 
Digital Transformation & Internet of Everything
Digital Transformation & Internet of EverythingDigital Transformation & Internet of Everything
Digital Transformation & Internet of Everything
Agence du Numérique (AdN)
 
Social Engineering new.pptx
Social Engineering new.pptxSocial Engineering new.pptx
Social Engineering new.pptx
Santhosh Prabhu
 
Capital One Data Breach
Capital One Data BreachCapital One Data Breach
Capital One Data Breach
Obika Gellineau
 

What's hot (20)

Data leakage detection
Data leakage detectionData leakage detection
Data leakage detection
 
Investigating Using the Dark Web
Investigating Using the Dark WebInvestigating Using the Dark Web
Investigating Using the Dark Web
 
Privacy & Data Protection in the Digital World
Privacy & Data Protection in the Digital WorldPrivacy & Data Protection in the Digital World
Privacy & Data Protection in the Digital World
 
Corporate threat vector and landscape
Corporate threat vector and landscapeCorporate threat vector and landscape
Corporate threat vector and landscape
 
osint - open source Intelligence
osint - open source Intelligenceosint - open source Intelligence
osint - open source Intelligence
 
Data leakage detection
Data leakage detectionData leakage detection
Data leakage detection
 
PHISHING PROTECTION
PHISHING PROTECTIONPHISHING PROTECTION
PHISHING PROTECTION
 
Cyber threat intelligence ppt
Cyber threat intelligence pptCyber threat intelligence ppt
Cyber threat intelligence ppt
 
Social Engineering
Social EngineeringSocial Engineering
Social Engineering
 
Data explosion
Data explosionData explosion
Data explosion
 
Data Privacy: What you need to know about privacy, from compliance to ethics
Data Privacy: What you need to know about privacy, from compliance to ethicsData Privacy: What you need to know about privacy, from compliance to ethics
Data Privacy: What you need to know about privacy, from compliance to ethics
 
Information security
Information securityInformation security
Information security
 
Cyber security
Cyber securityCyber security
Cyber security
 
Social Media Forensics for Investigators
Social Media Forensics for InvestigatorsSocial Media Forensics for Investigators
Social Media Forensics for Investigators
 
Data Security Explained
Data Security ExplainedData Security Explained
Data Security Explained
 
Cyber threat Intelligence and Incident Response by:-Sandeep Singh
Cyber threat Intelligence and Incident Response by:-Sandeep SinghCyber threat Intelligence and Incident Response by:-Sandeep Singh
Cyber threat Intelligence and Incident Response by:-Sandeep Singh
 
Fundamental digital forensik
Fundamental digital forensikFundamental digital forensik
Fundamental digital forensik
 
Digital Transformation & Internet of Everything
Digital Transformation & Internet of EverythingDigital Transformation & Internet of Everything
Digital Transformation & Internet of Everything
 
Social Engineering new.pptx
Social Engineering new.pptxSocial Engineering new.pptx
Social Engineering new.pptx
 
Capital One Data Breach
Capital One Data BreachCapital One Data Breach
Capital One Data Breach
 

Viewers also liked

Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and ...
Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and ...Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and ...
Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and ...
Khaled El Emam
 
An overview of methods for data anonymization
An overview of methods for data anonymizationAn overview of methods for data anonymization
An overview of methods for data anonymization
arx-deidentifier
 
Privacy in the Age of Big Data
Privacy in the Age of Big DataPrivacy in the Age of Big Data
Privacy in the Age of Big Data
marcgallardo
 
ARX - a comprehensive tool for anonymizing / de-identifying biomedical data
ARX - a comprehensive tool for anonymizing / de-identifying biomedical dataARX - a comprehensive tool for anonymizing / de-identifying biomedical data
ARX - a comprehensive tool for anonymizing / de-identifying biomedical data
arx-deidentifier
 
ARX - a comprehensive tool for anonymizing / de-identifying biomedical data
ARX - a comprehensive tool for anonymizing / de-identifying biomedical dataARX - a comprehensive tool for anonymizing / de-identifying biomedical data
ARX - a comprehensive tool for anonymizing / de-identifying biomedical data
arx-deidentifier
 
Engineering data privacy - The ARX data anonymization tool
Engineering data privacy - The ARX data anonymization toolEngineering data privacy - The ARX data anonymization tool
Engineering data privacy - The ARX data anonymization tool
arx-deidentifier
 

Viewers also liked (6)

Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and ...
Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and ...Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and ...
Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and ...
 
An overview of methods for data anonymization
An overview of methods for data anonymizationAn overview of methods for data anonymization
An overview of methods for data anonymization
 
Privacy in the Age of Big Data
Privacy in the Age of Big DataPrivacy in the Age of Big Data
Privacy in the Age of Big Data
 
ARX - a comprehensive tool for anonymizing / de-identifying biomedical data
ARX - a comprehensive tool for anonymizing / de-identifying biomedical dataARX - a comprehensive tool for anonymizing / de-identifying biomedical data
ARX - a comprehensive tool for anonymizing / de-identifying biomedical data
 
ARX - a comprehensive tool for anonymizing / de-identifying biomedical data
ARX - a comprehensive tool for anonymizing / de-identifying biomedical dataARX - a comprehensive tool for anonymizing / de-identifying biomedical data
ARX - a comprehensive tool for anonymizing / de-identifying biomedical data
 
Engineering data privacy - The ARX data anonymization tool
Engineering data privacy - The ARX data anonymization toolEngineering data privacy - The ARX data anonymization tool
Engineering data privacy - The ARX data anonymization tool
 

Similar to Anonymizing Health Data

O'Reilly Webcast: Anonymizing Health Data
O'Reilly Webcast: Anonymizing Health DataO'Reilly Webcast: Anonymizing Health Data
O'Reilly Webcast: Anonymizing Health Data
Luk Arbuckle
 
bwmedicalidt
bwmedicalidtbwmedicalidt
bwmedicalidt
Kara Hayden
 
DEF CON 23 - CHRIS ROCK - i will kill you how to get away with mu
DEF CON 23 - CHRIS ROCK - i will kill you how to get away with muDEF CON 23 - CHRIS ROCK - i will kill you how to get away with mu
DEF CON 23 - CHRIS ROCK - i will kill you how to get away with mu
Felipe Prado
 
dgcha07
dgcha07dgcha07
dgcha07
guestdf506f1
 
Can I share this? Curating sensitive data
Can I share this? Curating sensitive dataCan I share this? Curating sensitive data
Can I share this? Curating sensitive data
Graham Smith
 
What To Do if You Were Scammed
What To Do if You Were ScammedWhat To Do if You Were Scammed
What To Do if You Were Scammed
Mary Gathege
 
125 nipsta
125 nipsta125 nipsta
125 nipsta
Market JD, Inc.
 
Whistle Blowing Essays
Whistle Blowing EssaysWhistle Blowing Essays
Whistle Blowing Essays
Paper Writing Services
 
Wk1 dq2
Wk1 dq2Wk1 dq2
Wk1 dq2
Bridget_R
 
Wk1 dq2
Wk1 dq2Wk1 dq2
Wk1 dq2
Bridget_R
 
Prevent Elder Abuse
Prevent Elder AbusePrevent Elder Abuse
Adult protection and safeguarding presentation
Adult protection and safeguarding presentationAdult protection and safeguarding presentation
Adult protection and safeguarding presentation
Julian Dodd
 
Fifth Annual Study on Medical Identity Theft
Fifth Annual Study on Medical Identity TheftFifth Annual Study on Medical Identity Theft
Fifth Annual Study on Medical Identity Theft
- Mark - Fullbright
 
Debix OnCall Healthcare
Debix OnCall HealthcareDebix OnCall Healthcare
Debix OnCall Healthcare
itsmecramer
 
Sylvia hipaa powerpoint presentation 2010(1)
Sylvia hipaa powerpoint presentation 2010(1)Sylvia hipaa powerpoint presentation 2010(1)
Sylvia hipaa powerpoint presentation 2010(1)
bholmes
 
Data Breach Notifications Laws - Time for a Pimp Slap Presented by Steve Werb...
Data Breach Notifications Laws - Time for a Pimp Slap Presented by Steve Werb...Data Breach Notifications Laws - Time for a Pimp Slap Presented by Steve Werb...
Data Breach Notifications Laws - Time for a Pimp Slap Presented by Steve Werb...
Steve Werby
 
Avoiding Senior Scams
Avoiding Senior ScamsAvoiding Senior Scams
Avoiding Senior Scams
BrightStar Care
 
Sylvia hipaa powerpoint presentation 2010(2)
Sylvia hipaa powerpoint presentation 2010(2)Sylvia hipaa powerpoint presentation 2010(2)
Sylvia hipaa powerpoint presentation 2010(2)
bholmes
 
aidsandthedutytowarn-1-120817003514-phpapp01
aidsandthedutytowarn-1-120817003514-phpapp01aidsandthedutytowarn-1-120817003514-phpapp01
aidsandthedutytowarn-1-120817003514-phpapp01
Shanae Berry-Weldon
 
Hipaa Is Heating Up!!
Hipaa Is Heating Up!!Hipaa Is Heating Up!!
Hipaa Is Heating Up!!
Candy Matheny
 

Similar to Anonymizing Health Data (20)

O'Reilly Webcast: Anonymizing Health Data
O'Reilly Webcast: Anonymizing Health DataO'Reilly Webcast: Anonymizing Health Data
O'Reilly Webcast: Anonymizing Health Data
 
bwmedicalidt
bwmedicalidtbwmedicalidt
bwmedicalidt
 
DEF CON 23 - CHRIS ROCK - i will kill you how to get away with mu
DEF CON 23 - CHRIS ROCK - i will kill you how to get away with muDEF CON 23 - CHRIS ROCK - i will kill you how to get away with mu
DEF CON 23 - CHRIS ROCK - i will kill you how to get away with mu
 
dgcha07
dgcha07dgcha07
dgcha07
 
Can I share this? Curating sensitive data
Can I share this? Curating sensitive dataCan I share this? Curating sensitive data
Can I share this? Curating sensitive data
 
What To Do if You Were Scammed
What To Do if You Were ScammedWhat To Do if You Were Scammed
What To Do if You Were Scammed
 
125 nipsta
125 nipsta125 nipsta
125 nipsta
 
Whistle Blowing Essays
Whistle Blowing EssaysWhistle Blowing Essays
Whistle Blowing Essays
 
Wk1 dq2
Wk1 dq2Wk1 dq2
Wk1 dq2
 
Wk1 dq2
Wk1 dq2Wk1 dq2
Wk1 dq2
 
Prevent Elder Abuse
Prevent Elder AbusePrevent Elder Abuse
Prevent Elder Abuse
 
Adult protection and safeguarding presentation
Adult protection and safeguarding presentationAdult protection and safeguarding presentation
Adult protection and safeguarding presentation
 
Fifth Annual Study on Medical Identity Theft
Fifth Annual Study on Medical Identity TheftFifth Annual Study on Medical Identity Theft
Fifth Annual Study on Medical Identity Theft
 
Debix OnCall Healthcare
Debix OnCall HealthcareDebix OnCall Healthcare
Debix OnCall Healthcare
 
Sylvia hipaa powerpoint presentation 2010(1)
Sylvia hipaa powerpoint presentation 2010(1)Sylvia hipaa powerpoint presentation 2010(1)
Sylvia hipaa powerpoint presentation 2010(1)
 
Data Breach Notifications Laws - Time for a Pimp Slap Presented by Steve Werb...
Data Breach Notifications Laws - Time for a Pimp Slap Presented by Steve Werb...Data Breach Notifications Laws - Time for a Pimp Slap Presented by Steve Werb...
Data Breach Notifications Laws - Time for a Pimp Slap Presented by Steve Werb...
 
Avoiding Senior Scams
Avoiding Senior ScamsAvoiding Senior Scams
Avoiding Senior Scams
 
Sylvia hipaa powerpoint presentation 2010(2)
Sylvia hipaa powerpoint presentation 2010(2)Sylvia hipaa powerpoint presentation 2010(2)
Sylvia hipaa powerpoint presentation 2010(2)
 
aidsandthedutytowarn-1-120817003514-phpapp01
aidsandthedutytowarn-1-120817003514-phpapp01aidsandthedutytowarn-1-120817003514-phpapp01
aidsandthedutytowarn-1-120817003514-phpapp01
 
Hipaa Is Heating Up!!
Hipaa Is Heating Up!!Hipaa Is Heating Up!!
Hipaa Is Heating Up!!
 

More from Khaled El Emam

Canadian AI 2014 Conference Keynote - Deploying SMC in Practice
Canadian AI 2014 Conference Keynote - Deploying SMC in PracticeCanadian AI 2014 Conference Keynote - Deploying SMC in Practice
Canadian AI 2014 Conference Keynote - Deploying SMC in Practice
Khaled El Emam
 
Take Two Curves and Call Me in the Morning: The Story of the NSAs Dual_EC_DRB...
Take Two Curves and Call Me in the Morning: The Story of the NSAs Dual_EC_DRB...Take Two Curves and Call Me in the Morning: The Story of the NSAs Dual_EC_DRB...
Take Two Curves and Call Me in the Morning: The Story of the NSAs Dual_EC_DRB...
Khaled El Emam
 
Facilitating Analytics while Protecting Privacy
Facilitating Analytics while Protecting PrivacyFacilitating Analytics while Protecting Privacy
Facilitating Analytics while Protecting Privacy
Khaled El Emam
 
Sharing Health Research Data
Sharing Health Research DataSharing Health Research Data
Sharing Health Research Data
Khaled El Emam
 
Risk Based De-identification for Sharing Health Data
Risk Based De-identification for Sharing Health DataRisk Based De-identification for Sharing Health Data
Risk Based De-identification for Sharing Health Data
Khaled El Emam
 
The De-identification of Clinical Data
The De-identification of Clinical DataThe De-identification of Clinical Data
The De-identification of Clinical Data
Khaled El Emam
 
The Adoption of Personal Health Records by Consumers
The Adoption of Personal Health Records by ConsumersThe Adoption of Personal Health Records by Consumers
The Adoption of Personal Health Records by Consumers
Khaled El Emam
 
The Use of EDC in Canadian Clinical Trials
The Use of EDC in Canadian Clinical TrialsThe Use of EDC in Canadian Clinical Trials
The Use of EDC in Canadian Clinical Trials
Khaled El Emam
 

More from Khaled El Emam (8)

Canadian AI 2014 Conference Keynote - Deploying SMC in Practice
Canadian AI 2014 Conference Keynote - Deploying SMC in PracticeCanadian AI 2014 Conference Keynote - Deploying SMC in Practice
Canadian AI 2014 Conference Keynote - Deploying SMC in Practice
 
Take Two Curves and Call Me in the Morning: The Story of the NSAs Dual_EC_DRB...
Take Two Curves and Call Me in the Morning: The Story of the NSAs Dual_EC_DRB...Take Two Curves and Call Me in the Morning: The Story of the NSAs Dual_EC_DRB...
Take Two Curves and Call Me in the Morning: The Story of the NSAs Dual_EC_DRB...
 
Facilitating Analytics while Protecting Privacy
Facilitating Analytics while Protecting PrivacyFacilitating Analytics while Protecting Privacy
Facilitating Analytics while Protecting Privacy
 
Sharing Health Research Data
Sharing Health Research DataSharing Health Research Data
Sharing Health Research Data
 
Risk Based De-identification for Sharing Health Data
Risk Based De-identification for Sharing Health DataRisk Based De-identification for Sharing Health Data
Risk Based De-identification for Sharing Health Data
 
The De-identification of Clinical Data
The De-identification of Clinical DataThe De-identification of Clinical Data
The De-identification of Clinical Data
 
The Adoption of Personal Health Records by Consumers
The Adoption of Personal Health Records by ConsumersThe Adoption of Personal Health Records by Consumers
The Adoption of Personal Health Records by Consumers
 
The Use of EDC in Canadian Clinical Trials
The Use of EDC in Canadian Clinical TrialsThe Use of EDC in Canadian Clinical Trials
The Use of EDC in Canadian Clinical Trials
 

Recently uploaded

Promoting Wellbeing - Applied Social Psychology - Psychology SuperNotes
Promoting Wellbeing - Applied Social Psychology - Psychology SuperNotesPromoting Wellbeing - Applied Social Psychology - Psychology SuperNotes
Promoting Wellbeing - Applied Social Psychology - Psychology SuperNotes
PsychoTech Services
 
8 Surprising Reasons To Meditate 40 Minutes A Day That Can Change Your Life.pptx
8 Surprising Reasons To Meditate 40 Minutes A Day That Can Change Your Life.pptx8 Surprising Reasons To Meditate 40 Minutes A Day That Can Change Your Life.pptx
8 Surprising Reasons To Meditate 40 Minutes A Day That Can Change Your Life.pptx
Holistified Wellness
 
Identification and nursing management of congenital malformations .pptx
Identification and nursing management of congenital malformations .pptxIdentification and nursing management of congenital malformations .pptx
Identification and nursing management of congenital malformations .pptx
MGM SCHOOL/COLLEGE OF NURSING
 
ABDOMINAL TRAUMA in pediatrics part one.
ABDOMINAL TRAUMA in pediatrics part one.ABDOMINAL TRAUMA in pediatrics part one.
ABDOMINAL TRAUMA in pediatrics part one.
drhasanrajab
 
CHEMOTHERAPY_RDP_CHAPTER 6_Anti Malarial Drugs.pdf
CHEMOTHERAPY_RDP_CHAPTER 6_Anti Malarial Drugs.pdfCHEMOTHERAPY_RDP_CHAPTER 6_Anti Malarial Drugs.pdf
CHEMOTHERAPY_RDP_CHAPTER 6_Anti Malarial Drugs.pdf
rishi2789
 
Adhd Medication Shortage Uk - trinexpharmacy.com
Adhd Medication Shortage Uk - trinexpharmacy.comAdhd Medication Shortage Uk - trinexpharmacy.com
Adhd Medication Shortage Uk - trinexpharmacy.com
reignlana06
 
Thyroid Gland- Gross Anatomy by Dr. Rabia Inam Gandapore.pptx
Thyroid Gland- Gross Anatomy by Dr. Rabia Inam Gandapore.pptxThyroid Gland- Gross Anatomy by Dr. Rabia Inam Gandapore.pptx
Thyroid Gland- Gross Anatomy by Dr. Rabia Inam Gandapore.pptx
Dr. Rabia Inam Gandapore
 
Basavarajeeyam - Ayurvedic heritage book of Andhra pradesh
Basavarajeeyam - Ayurvedic heritage book of Andhra pradeshBasavarajeeyam - Ayurvedic heritage book of Andhra pradesh
Basavarajeeyam - Ayurvedic heritage book of Andhra pradesh
Dr. Madduru Muni Haritha
 
Vestibulocochlear Nerve by Dr. Rabia Inam Gandapore.pptx
Vestibulocochlear Nerve by Dr. Rabia Inam Gandapore.pptxVestibulocochlear Nerve by Dr. Rabia Inam Gandapore.pptx
Vestibulocochlear Nerve by Dr. Rabia Inam Gandapore.pptx
Dr. Rabia Inam Gandapore
 
Hemodialysis: Chapter 4, Dialysate Circuit - Dr.Gawad
Hemodialysis: Chapter 4, Dialysate Circuit - Dr.GawadHemodialysis: Chapter 4, Dialysate Circuit - Dr.Gawad
Hemodialysis: Chapter 4, Dialysate Circuit - Dr.Gawad
NephroTube - Dr.Gawad
 
Netter's Atlas of Human Anatomy 7.ed.pdf
Netter's Atlas of Human Anatomy 7.ed.pdfNetter's Atlas of Human Anatomy 7.ed.pdf
Netter's Atlas of Human Anatomy 7.ed.pdf
BrissaOrtiz3
 
Top Effective Soaps for Fungal Skin Infections in India
Top Effective Soaps for Fungal Skin Infections in IndiaTop Effective Soaps for Fungal Skin Infections in India
Top Effective Soaps for Fungal Skin Infections in India
SwisschemDerma
 
#cALL# #gIRLS# In Dehradun ꧁❤8107221448❤꧂#cALL# #gIRLS# Service In Dehradun W...
#cALL# #gIRLS# In Dehradun ꧁❤8107221448❤꧂#cALL# #gIRLS# Service In Dehradun W...#cALL# #gIRLS# In Dehradun ꧁❤8107221448❤꧂#cALL# #gIRLS# Service In Dehradun W...
#cALL# #gIRLS# In Dehradun ꧁❤8107221448❤꧂#cALL# #gIRLS# Service In Dehradun W...
chandankumarsmartiso
 
Phone Us ❤8107221448❤ #ℂall #gIRLS In Dehradun By Dehradun @ℂall @Girls Hotel...
Phone Us ❤8107221448❤ #ℂall #gIRLS In Dehradun By Dehradun @ℂall @Girls Hotel...Phone Us ❤8107221448❤ #ℂall #gIRLS In Dehradun By Dehradun @ℂall @Girls Hotel...
Phone Us ❤8107221448❤ #ℂall #gIRLS In Dehradun By Dehradun @ℂall @Girls Hotel...
chandankumarsmartiso
 
The Electrocardiogram - Physiologic Principles
The Electrocardiogram - Physiologic PrinciplesThe Electrocardiogram - Physiologic Principles
The Electrocardiogram - Physiologic Principles
MedicoseAcademics
 
Role of Mukta Pishti in the Management of Hyperthyroidism
Role of Mukta Pishti in the Management of HyperthyroidismRole of Mukta Pishti in the Management of Hyperthyroidism
Role of Mukta Pishti in the Management of Hyperthyroidism
Dr. Jyothirmai Paindla
 
Best Ayurvedic medicine for Gas and Indigestion
Best Ayurvedic medicine for Gas and IndigestionBest Ayurvedic medicine for Gas and Indigestion
Best Ayurvedic medicine for Gas and Indigestion
Swastik Ayurveda
 
Top 10 Best Ayurvedic Kidney Stone Syrups in India
Top 10 Best Ayurvedic Kidney Stone Syrups in IndiaTop 10 Best Ayurvedic Kidney Stone Syrups in India
Top 10 Best Ayurvedic Kidney Stone Syrups in India
Swastik Ayurveda
 
Chapter 11 Nutrition and Chronic Diseases.pptx
Chapter 11 Nutrition and Chronic Diseases.pptxChapter 11 Nutrition and Chronic Diseases.pptx
Chapter 11 Nutrition and Chronic Diseases.pptx
Earlene McNair
 
Journal Article Review on Rasamanikya
Journal Article Review on RasamanikyaJournal Article Review on Rasamanikya
Journal Article Review on Rasamanikya
Dr. Jyothirmai Paindla
 

Recently uploaded (20)

Promoting Wellbeing - Applied Social Psychology - Psychology SuperNotes
Promoting Wellbeing - Applied Social Psychology - Psychology SuperNotesPromoting Wellbeing - Applied Social Psychology - Psychology SuperNotes
Promoting Wellbeing - Applied Social Psychology - Psychology SuperNotes
 
8 Surprising Reasons To Meditate 40 Minutes A Day That Can Change Your Life.pptx
8 Surprising Reasons To Meditate 40 Minutes A Day That Can Change Your Life.pptx8 Surprising Reasons To Meditate 40 Minutes A Day That Can Change Your Life.pptx
8 Surprising Reasons To Meditate 40 Minutes A Day That Can Change Your Life.pptx
 
Identification and nursing management of congenital malformations .pptx
Identification and nursing management of congenital malformations .pptxIdentification and nursing management of congenital malformations .pptx
Identification and nursing management of congenital malformations .pptx
 
ABDOMINAL TRAUMA in pediatrics part one.
ABDOMINAL TRAUMA in pediatrics part one.ABDOMINAL TRAUMA in pediatrics part one.
ABDOMINAL TRAUMA in pediatrics part one.
 
CHEMOTHERAPY_RDP_CHAPTER 6_Anti Malarial Drugs.pdf
CHEMOTHERAPY_RDP_CHAPTER 6_Anti Malarial Drugs.pdfCHEMOTHERAPY_RDP_CHAPTER 6_Anti Malarial Drugs.pdf
CHEMOTHERAPY_RDP_CHAPTER 6_Anti Malarial Drugs.pdf
 
Adhd Medication Shortage Uk - trinexpharmacy.com
Adhd Medication Shortage Uk - trinexpharmacy.comAdhd Medication Shortage Uk - trinexpharmacy.com
Adhd Medication Shortage Uk - trinexpharmacy.com
 
Thyroid Gland- Gross Anatomy by Dr. Rabia Inam Gandapore.pptx
Thyroid Gland- Gross Anatomy by Dr. Rabia Inam Gandapore.pptxThyroid Gland- Gross Anatomy by Dr. Rabia Inam Gandapore.pptx
Thyroid Gland- Gross Anatomy by Dr. Rabia Inam Gandapore.pptx
 
Basavarajeeyam - Ayurvedic heritage book of Andhra pradesh
Basavarajeeyam - Ayurvedic heritage book of Andhra pradeshBasavarajeeyam - Ayurvedic heritage book of Andhra pradesh
Basavarajeeyam - Ayurvedic heritage book of Andhra pradesh
 
Vestibulocochlear Nerve by Dr. Rabia Inam Gandapore.pptx
Vestibulocochlear Nerve by Dr. Rabia Inam Gandapore.pptxVestibulocochlear Nerve by Dr. Rabia Inam Gandapore.pptx
Vestibulocochlear Nerve by Dr. Rabia Inam Gandapore.pptx
 
Hemodialysis: Chapter 4, Dialysate Circuit - Dr.Gawad
Hemodialysis: Chapter 4, Dialysate Circuit - Dr.GawadHemodialysis: Chapter 4, Dialysate Circuit - Dr.Gawad
Hemodialysis: Chapter 4, Dialysate Circuit - Dr.Gawad
 
Netter's Atlas of Human Anatomy 7.ed.pdf
Netter's Atlas of Human Anatomy 7.ed.pdfNetter's Atlas of Human Anatomy 7.ed.pdf
Netter's Atlas of Human Anatomy 7.ed.pdf
 
Top Effective Soaps for Fungal Skin Infections in India
Top Effective Soaps for Fungal Skin Infections in IndiaTop Effective Soaps for Fungal Skin Infections in India
Top Effective Soaps for Fungal Skin Infections in India
 
#cALL# #gIRLS# In Dehradun ꧁❤8107221448❤꧂#cALL# #gIRLS# Service In Dehradun W...
#cALL# #gIRLS# In Dehradun ꧁❤8107221448❤꧂#cALL# #gIRLS# Service In Dehradun W...#cALL# #gIRLS# In Dehradun ꧁❤8107221448❤꧂#cALL# #gIRLS# Service In Dehradun W...
#cALL# #gIRLS# In Dehradun ꧁❤8107221448❤꧂#cALL# #gIRLS# Service In Dehradun W...
 
Phone Us ❤8107221448❤ #ℂall #gIRLS In Dehradun By Dehradun @ℂall @Girls Hotel...
Phone Us ❤8107221448❤ #ℂall #gIRLS In Dehradun By Dehradun @ℂall @Girls Hotel...Phone Us ❤8107221448❤ #ℂall #gIRLS In Dehradun By Dehradun @ℂall @Girls Hotel...
Phone Us ❤8107221448❤ #ℂall #gIRLS In Dehradun By Dehradun @ℂall @Girls Hotel...
 
The Electrocardiogram - Physiologic Principles
The Electrocardiogram - Physiologic PrinciplesThe Electrocardiogram - Physiologic Principles
The Electrocardiogram - Physiologic Principles
 
Role of Mukta Pishti in the Management of Hyperthyroidism
Role of Mukta Pishti in the Management of HyperthyroidismRole of Mukta Pishti in the Management of Hyperthyroidism
Role of Mukta Pishti in the Management of Hyperthyroidism
 
Best Ayurvedic medicine for Gas and Indigestion
Best Ayurvedic medicine for Gas and IndigestionBest Ayurvedic medicine for Gas and Indigestion
Best Ayurvedic medicine for Gas and Indigestion
 
Top 10 Best Ayurvedic Kidney Stone Syrups in India
Top 10 Best Ayurvedic Kidney Stone Syrups in IndiaTop 10 Best Ayurvedic Kidney Stone Syrups in India
Top 10 Best Ayurvedic Kidney Stone Syrups in India
 
Chapter 11 Nutrition and Chronic Diseases.pptx
Chapter 11 Nutrition and Chronic Diseases.pptxChapter 11 Nutrition and Chronic Diseases.pptx
Chapter 11 Nutrition and Chronic Diseases.pptx
 
Journal Article Review on Rasamanikya
Journal Article Review on RasamanikyaJournal Article Review on Rasamanikya
Journal Article Review on Rasamanikya
 

Anonymizing Health Data

  • 1. Anonymizing Health Data Webcast Case Studies and Methods to Get You S Khaled El Emam & Luk
  • 2. Anonymizing Health Data Part 1 of Webcast: Intro and Methodology Part 2 of Webcast: A Look at Our Case Studies Part 3 of Webcast: Questions and Answers Khaled El Emam & Luk
  • 3. Anonymizing Health Data Part 1 of Webcast: Intro and Methodology Khaled El Emam & Luk
  • 4. Anonymizing Health Data To Anonymize or not to Anonymize Khaled El Emam & Luk
  • 5. Anonymizing Health Data Consent needs to be informed. To Anonymize or not to Anonymize Khaled El Emam & Luk
  • 6. Anonymizing Health Data Consent needs to be informed. Not all health care providers are willing to share their patient’s PHI. To Anonymize or not to Anonymize Khaled El Emam & Luk
  • 7. Anonymizing Health Data Consent needs to be informed. Not all health care providers are willing to share their patient’s PHI. Anonymization allows for the sharing of health information. To Anonymize or not to Anonymize Khaled El Emam & Luk
  • 8. Anonymizing Health Data Consent needs to be informed. Not all health care providers are willing to share their patient’s PHI. Anonymization allows for the sharing of health information. To Anonymize or not to Anonymize Compelling financial case. Breach cost ~$200 per patient. Khaled El Emam & Luk
  • 9. Anonymizing Health Data Consent needs to be informed. Not all health care providers are willing to share their patient’s PHI. Anonymization allows for the sharing of health information. To Anonymize or not to Anonymize Compelling financial case. Breach cost ~$200 per patient. Khaled El Emam & Luk
  • 10. Anonymizing Health Data Consent needs to be informed. Not all health care providers are willing to share their patient’s PHI. Anonymization allows for the sharing of health information. To Anonymize or not to Anonymize Privacy protective behaviors by patients. Compelling financial case. Breach cost ~$200 per patient. Khaled El Emam & Luk
  • 11. Anonymizing Health Data Masking Standards Khaled El Emam & Luk
  • 12. Anonymizing Health Data Masking Standards First name, last name, SSN. Khaled El Emam & Luk
  • 13. Anonymizing Health Data Masking Standards Distortion of data—no analytics. First name, last name, SSN. Khaled El Emam & Luk
  • 14. Anonymizing Health Data Masking Standards Creating pseudonyms. First name, last name, SSN. Distortion of data—no analytics. Khaled El Emam & Luk
  • 15. Anonymizing Health Data Masking Standards Removing a whole field. Creating pseudonyms. First name, last name, SSN. Distortion of data—no analytics. Khaled El Emam & Luk
  • 16. Anonymizing Health Data Masking Standards Removing a whole field. Creating pseudonyms. Replacing actual values with random ones. First name, last name, SSN. Distortion of data—no analytics. Khaled El Emam & Luk
  • 17. Anonymizing Health Data De-identification Standards Khaled El Emam & Luk
  • 18. Anonymizing Health Data De-identification Standards Age, sex, race, address, income. Khaled El Emam & Luk
  • 19. Anonymizing Health Data Minimal distortion of data—for analytics. Age, sex, race, address, income. De-identification Standards Khaled El Emam & Luk
  • 20. Anonymizing Health Data Minimal distortion of data—for analytics. Age, sex, race, address, income. De-identification Standards Safe Harbor in HIPAA Privacy Rule. Khaled El Emam & Luk
  • 21. Anonymizing Health Data What’s “Actual Knowledge”? Privacy Rule Safe Harbor Khaled El Emam & Luk
  • 22. Anonymizing Health Data What’s “Actual Knowledge”? Info, alone or in combo, that could identify an individual. Khaled El Emam & Luk
  • 23. Anonymizing Health Data What’s “Actual Knowledge”? Info, alone or in combo, that could identify an individual. Has to be specific to the data set—not theoretical. Khaled El Emam & Luk
  • 24. Anonymizing Health Data What’s “Actual Knowledge”? Info, alone or in combo, that could identify an individual. Has to be specific to the data set—not theoretical. Occupation Mayor of Gotham. Khaled El Emam & Luk
  • 25. Anonymizing Health Data Heuristics, or rules of thumb. Minimal distortion of data—for analytics. Age, sex, race, address, income. Safe Harbor in HIPAA Privacy Rule. De-identification Standards Khaled El Emam & Luk
  • 26. Anonymizing Health Data Heuristics, or rules of thumb. Statistical method in HIPAA Privacy Rule. Minimal distortion of data—for analytics. Age, sex, race, address, income. Safe Harbor in HIPAA Privacy Rule. De-identification Standards Khaled El Emam & Luk
  • 27. Anonymizing Health Data De-identification Myths Khaled El Emam & Luk
  • 28. Anonymizing Health Data De-identification Myths Myth: It’s possible to re-identify most, if not all, data. Khaled El Emam & Luk
  • 29. Anonymizing Health Data De-identification Myths Myth: It’s possible to re-identify most, if not all, data. Using robust methods, evidence suggests risk can be very small. Khaled El Emam & Luk
  • 30. Anonymizing Health Data De-identification Myths Myth: It’s possible to re-identify most, if not all, data. Myth: Genomic sequences are not identifiable, or are easy to re-identify. Using robust methods, evidence suggests risk can be very small. Khaled El Emam & Luk
  • 31. Anonymizing Health Data De-identification Myths Myth: It’s possible to re-identify most, if not all, data. Myth: Genomic sequences are not identifiable, or are easy to re-identify. In some cases can re-identify, difficult to de- identify using our methods. Using robust methods, evidence suggests risk can be very small. Khaled El Emam & Luk
  • 32. Anonymizing Health Data A Risk-based De-identification Methodology Khaled El Emam & Luk
  • 33. Anonymizing Health Data A Risk-based De-identification Methodology The risk of re-identification can be quantified. Khaled El Emam & Luk
  • 34. Anonymizing Health Data A Risk-based De-identification Methodology The risk of re-identification can be quantified. The Goldilocks principle: balancing privacy with data utility. Khaled El Emam & Luk
  • 36. Anonymizing Health Data A Risk-based De-identification Methodology The risk of re-identification can be quantified. The Goldilocks principle: balancing privacy with data utility. The re-identification risk needs to be very small. Khaled El Emam & Luk
  • 37. Anonymizing Health Data A Risk-based De-identification Methodology The risk of re-identification can be quantified. The Goldilocks principle: balancing privacy with data utility. De-identification involves a mix of technical, contractual, and other measures. The re-identification risk needs to be very small. Khaled El Emam & Luk
  • 38. Anonymizing Health Data Steps in the De-identification Methodology Step 1: Select Direct and Indirect Identifiers Step 2: Setting the Threshold Step 3: Examining Plausible Attacks Step 4: De-identifying the Data Step 5: Documenting the Process Khaled El Emam & Luk
  • 39. Anonymizing Health Data Step 1: Select Direct and Indirect Identifiers Khaled El Emam & Luk
  • 40. Anonymizing Health Data Direct identifiers: name, telephone number, health insurance card number, medical record number. Step 1: Select Direct and Indirect Identifiers Khaled El Emam & Luk
  • 41. Anonymizing Health Data Direct identifiers: name, telephone number, health insurance card number, medical record number. Indirect identifiers, or quasi-identifiers: sex, date of birth, ethnicity, locations, event dates, medical codes. Step 1: Select Direct and Indirect Identifiers Khaled El Emam & Luk
  • 42. Anonymizing Health Data Step 2: Setting the Threshold Khaled El Emam & Luk
  • 43. Anonymizing Health Data Maximum acceptable risk for sharing data. Step 2: Setting the Threshold Khaled El Emam & Luk
  • 44. Anonymizing Health Data Maximum acceptable risk for sharing data. Needs to be quantitative and defensible. Step 2: Setting the Threshold Khaled El Emam & Luk
  • 45. Anonymizing Health Data Maximum acceptable risk for sharing data. Needs to be quantitative and defensible. Is the data in going to be in the public domain? Step 2: Setting the Threshold Khaled El Emam & Luk
  • 46. Anonymizing Health Data Maximum acceptable risk for sharing data. Needs to be quantitative and defensible. Is the data in going to be in the public domain? Extent of invasion-of-privacy when data was shared? Step 2: Setting the Threshold Khaled El Emam & Luk
  • 47. Anonymizing Health Data Step 3: Examining Plausible Attacks Khaled El Emam & Luk
  • 48. Anonymizing Health Data Recipient deliberately attempts to re-identify the data. Step 3: Examining Plausible Attacks Khaled El Emam & Luk
  • 49. Anonymizing Health Data Recipient deliberately attempts to re-identify the data. Recipient inadvertently re-identifies the data. “Holly Smokes, I know her!” Step 3: Examining Plausible Attacks Khaled El Emam & Luk
  • 50. Anonymizing Health Data Recipient deliberately attempts to re-identify the data. Recipient inadvertently re-identifies the data. Data breach at recipient’s site, “data gone wild”. Step 3: Examining Plausible Attacks Khaled El Emam & Luk
  • 51. Anonymizing Health Data Recipient deliberately attempts to re-identify the data. Data breach at recipient’s site, “data gone wild”. Adversary launches a demonstration attack on the data. Step 3: Examining Plausible Attacks Khaled El Emam & Luk Recipient inadvertently re-identifies the data.
  • 52. Anonymizing Health Data Step 4: De-identifying the Data Khaled El Emam & Luk
  • 53. Anonymizing Health Data Step 4: De-identifying the Data Generalization: reducing the precision of a field. Dates converted to month/year, or year. Khaled El Emam & Luk
  • 54. Anonymizing Health Data Step 4: De-identifying the Data Generalization: reducing the precision of a field. Suppression: replacing a cell with NULL. Unique 55-year old female in birth registry. Khaled El Emam & Luk
  • 55. Anonymizing Health Data Step 4: De-identifying the Data Generalization: reducing the precision of a field. Suppression: replacing a cell with NULL. Sub-sampling: releasing a simple random sample. 50% of data set instead of all data. Khaled El Emam & Luk
  • 56. Anonymizing Health Data Step 5: Documenting the Process Khaled El Emam & Luk
  • 57. Anonymizing Health Data Step 5: Documenting the Process Process documentation—a methodology text. Khaled El Emam & Luk
  • 58. Anonymizing Health Data Step 5: Documenting the Process Results documentation—data set, risk thresholds, assumptions, evidence of low risk. Khaled El Emam & Luk Process documentation—a methodology text.
  • 59. Anonymizing Health Data Measuring Risk Under Plausible Attacks Khaled El Emam & Luk
  • 60. Anonymizing Health Data T1:Deliberate Attempt Measuring Risk Under Plausible Attacks Pr(re-id, attempt) = Pr(attempt) Pr(re-id | attempt) Khaled El Emam & Luk
  • 61. Anonymizing Health Data T1:Deliberate Attempt Measuring Risk Under Plausible Attacks Khaled El Emam & Luk T2: Inadvertent Attempt (“Holly Smokes, I know her!”) Pr(re-id, acquaintance) = Pr(acquaintance) Pr(re-id | acquaintance)
  • 62. Anonymizing Health Data T1:Deliberate Attempt Measuring Risk Under Plausible Attacks Khaled El Emam & Luk T2: Inadvertent Attempt (“Holly Smokes, I know her!”) T3: Data Breach (“data gone wild”) Pr(re-id, breach) = Pr(breach) Pr(re-id | breach)
  • 63. Anonymizing Health Data T1:Deliberate Attempt Measuring Risk Under Plausible Attacks Khaled El Emam & Luk T2: Inadvertent Attempt (“Holly Smokes, I know her!”) T3: Data Breach (“data gone wild”) T4: Public Data (demonstration attack) Pr(re-id), based on data set only
  • 64. Anonymizing Health Data Choosing Thresholds Khaled El Emam & Luk
  • 65. Anonymizing Health Data Choosing Thresholds Khaled El Emam & Luk Many precedents going back multiple decades.
  • 66. Anonymizing Health Data Choosing Thresholds Khaled El Emam & Luk Many precedents going back multiple decades. Recommended by regulators.
  • 67. Anonymizing Health Data Choosing Thresholds Khaled El Emam & Luk Many precedents going back multiple decades. Recommended by regulators. All based on max risk though.
  • 68. Anonymizing Health Data Choosing Thresholds Khaled El Emam & Luk Many precedents going back multiple decades. Recommended by regulators. All based on max risk though.
  • 69. Anonymizing Health Data Part 2 of Webcast: A Look at Our Case Studies Khaled El Emam & Luk
  • 70. Anonymizing Health Data Cross Sectional Data: Research Registries Khaled El Emam & Luk
  • 71. Anonymizing Health Data Cross Sectional Data: Research Registries Khaled El Emam & Luk Better Outcomes Registry & Network (BORN) of Ontario
  • 72. Anonymizing Health Data Cross Sectional Data: Research Registries Khaled El Emam & Luk Better Outcomes Registry & Network (BORN) of Ontario 140,000 births per year.
  • 73. Anonymizing Health Data Cross Sectional Data: Research Registries Khaled El Emam & Luk Better Outcomes Registry & Network (BORN) of Ontario 140,000 births per year. Cross-sectional—mothers not traced over time.
  • 74. Anonymizing Health Data Cross Sectional Data: Research Registries Khaled El Emam & Luk Better Outcomes Registry & Network (BORN) of Ontario 140,000 births per year. Cross-sectional—mothers not traced over time. Process of getting de-identified data from a research registry.
  • 75. Anonymizing Health Data Cross Sectional Data: Research Registries Khaled El Emam & Luk Better Outcomes Registry & Network (BORN) of Ontario 140,000 births per year. Cross-sectional—mothers not traced over time. Process of getting de-identified data from a research registry.
  • 76. Anonymizing Health Data Researcher Ronnie wants data! Khaled El Emam & Luk
  • 77. Anonymizing Health Data Researcher Ronnie wants data! Khaled El Emam & Luk 919,710 records from 2005-2011
  • 78. Anonymizing Health Data Researcher Ronnie wants data! Khaled El Emam & Luk 919,710 records from 2005-2011
  • 79. Anonymizing Health Data Choosing Thresholds Khaled El Emam & Luk
  • 80. Anonymizing Health Data Choosing Thresholds Khaled El Emam & Luk Average risk of 0.1 for Researcher Ronnie (and the data he specifically requested).
  • 81. Anonymizing Health Data Choosing Thresholds Khaled El Emam & Luk 0.05 if there were highly sensitive variables (congenital anomalies, mental health problems). Average risk of 0.1 for Researcher Ronnie
  • 82. Anonymizing Health Data Measuring Risk Under Plausible Attacks Khaled El Emam & Luk
  • 83. Anonymizing Health Data T1:Deliberate Attempt Measuring Risk Under Plausible Attacks Khaled El Emam & Luk Low motives and capacity
  • 84. Anonymizing Health Data T1:Deliberate Attempt Measuring Risk Under Plausible Attacks Khaled El Emam & Luk Low motives and capacity; low mitigating controls.
  • 85. Anonymizing Health Data T1:Deliberate Attempt Measuring Risk Under Plausible Attacks Khaled El Emam & Luk Pr(attempt) = 0.4
  • 86. Anonymizing Health Data T1:Deliberate Attempt Measuring Risk Under Plausible Attacks Khaled El Emam & Luk T2: Inadvertent Attempt (“Holly Smokes, I know her!”) 119,785 births out of a 4,478,500 women ( = 0.027)
  • 87. Anonymizing Health Data T1:Deliberate Attempt Measuring Risk Under Plausible Attacks Khaled El Emam & Luk T2: Inadvertent Attempt (“Holly Smokes, I know her!”) Pr(aquaintance) = 1- (1-0.027)150/2 = 0.87
  • 88. Anonymizing Health Data T1:Deliberate Attempt Measuring Risk Under Plausible Attacks Khaled El Emam & Luk T2: Inadvertent Attempt (“Holly Smokes, I know her!”) T3: Data Breach (“data gone wild”) Based on historical data.
  • 89. Anonymizing Health Data T1:Deliberate Attempt Measuring Risk Under Plausible Attacks Khaled El Emam & Luk T2: Inadvertent Attempt (“Holly Smokes, I know her!”) T3: Data Breach (“data gone wild”) Pr(breach)=0.27
  • 90. Anonymizing Health Data T1:Deliberate Attempt Measuring Risk Under Plausible Attacks Khaled El Emam & Luk T2: Inadvertent Attempt (“Holly Smokes, I know her!”) T3: Data Breach (“data gone wild”) T4: Public Data (demonstration attack)
  • 91. Anonymizing Health Data T1:Deliberate Attempt Measuring Risk Under Plausible Attacks Khaled El Emam & Luk T2: Inadvertent Attempt (“Holly Smokes, I know her!”) T3: Data Breach (“data gone wild”) Overall risk Pr(re-id, T) = Pr(T) x Pr(re-id | T) ≤ 0.1
  • 92. Anonymizing Health Data Measuring Risk Under Plausible Attacks Khaled El Emam & Luk T2: Inadvertent Attempt (“Holly Smokes, I know her!”) Pr(aquaintance) = 1- (1-0.027)150/2 = 0.87 Overall risk Pr(re-id, acquaintance) = 0.87 Pr(re-id | acquaintance) ≤ 0.1
  • 93. Anonymizing Health Data De-identifying the Data Set Khaled El Emam & Luk
  • 94. Anonymizing Health Data Meeting Thresholds: k-anonymity Khaled El Emam & Luk k
  • 95. Anonymizing Health Data Meeting Thresholds: k-anonymity Khaled El Emam & Luk
  • 96. Anonymizing Health Data De-identifying the Data Set Khaled El Emam & Luk MDOB in 1-yy; BDOB in wk/yy; MPC of 1 char.
  • 97. Anonymizing Health Data De-identifying the Data Set Khaled El Emam & Luk MDOB in 1-yy; BDOB in wk/yy; MPC of 1 char. MDOB in 10-yy; BDOB in qtr/yy; MPC of 3 chars.
  • 98. Anonymizing Health Data De-identifying the Data Set Khaled El Emam & Luk MDOB in 1-yy; BDOB in wk/yy; MPC of 1 char. MDOB in 10-yy; BDOB in qtr/yy; MPC of 3 chars. MDOB in 10-yy; BDOB in mm/yy; MPC of 3 chars.
  • 99. Anonymizing Health Data Year on Year: Re-using Risk Analyses Khaled El Emam & Luk
  • 100. Anonymizing Health Data Year on Year: Re-using Risk Analyses Khaled El Emam & Luk In 2006 Researcher Ronnie asks for 2005.
  • 101. Anonymizing Health Data Year on Year: Re-using Risk Analyses Khaled El Emam & Luk In 2006 Researcher Ronnie asks for 2005—deleted. In 2007 Researcher Ronnie asks for 2006.
  • 102. Anonymizing Health Data Year on Year: Re-using Risk Analyses Khaled El Emam & Luk In 2006 Researcher Ronnie asks for 2005. In 2007 Researcher Ronnie asks for 2006—deleted. In 2008 Researcher Ronnie asks for 2007.
  • 103. Anonymizing Health Data Year on Year: Re-using Risk Analyses Khaled El Emam & Luk In 2006 Researcher Ronnie asks for 2005. In 2007 Researcher Ronnie asks for 2006. In 2008 Researcher Ronnie asks for 2007—deleted. In 2009 Researcher Ronnie asks for 2008.
  • 104. Anonymizing Health Data Year on Year: Re-using Risk Analyses Khaled El Emam & Luk In 2006 Researcher Ronnie asks for 2005. In 2007 Researcher Ronnie asks for 2006. In 2008 Researcher Ronnie asks for 2007. In 2009 Researcher Ronnie asks for 2008—deleted. In 2010 Researcher Ronnie asks for 2009.
  • 105. Anonymizing Health Data Year on Year: Re-using Risk Analyses Khaled El Emam & Luk In 2006 Researcher Ronnie asks for 2005. In 2007 Researcher Ronnie asks for 2006. In 2008 Researcher Ronnie asks for 2007. In 2009 Researcher Ronnie asks for 2008—deleted. In 2010 Researcher Ronnie asks for 2009. Can we use the same de-identification scheme every year?
  • 108. Anonymizing Health Data Year on Year: Re-using Risk Analyses Khaled El Emam & Luk BORN data pertains to very stable populations.
  • 109. Anonymizing Health Data Year on Year: Re-using Risk Analyses Khaled El Emam & Luk BORN data pertains to very stable populations. No dramatic changes in the number or characteristics of births from 2005-2010.
  • 110. Anonymizing Health Data Year on Year: Re-using Risk Analyses Khaled El Emam & Luk BORN data pertains to very stable populations. No dramatic changes in the number or characteristics of births from 2005-2010. Revisit de-identification scheme every 18 to 24 months.
  • 111. Anonymizing Health Data Year on Year: Re-using Risk Analyses Khaled El Emam & Luk BORN data pertains to very stable populations. No dramatic changes in the number or characteristics of births from 2005-2010. Revisit de-identification scheme every 18 to 24 months. Revisit if any new quasi-identifiers are added or changed.
  • 112. Anonymizing Health Data Longitudinal Discharge Abstract Data: State Inpatient Databases Khaled El Emam & Luk
  • 113. Anonymizing Health Data Longitudinal Discharge Abstract Data: State Inpatient Databases Khaled El Emam & Luk Linking a patient’s records over time.
  • 114. Anonymizing Health Data Longitudinal Discharge Abstract Data: State Inpatient Databases Khaled El Emam & Luk Linking a patient’s records over time. Need to be de-identified differently.
  • 115. Anonymizing Health Data Meeting Thresholds: k-anonymity? Khaled El Emam & Luk k?
  • 116. Anonymizing Health Data Meeting Thresholds: k-anonymity? Khaled El Emam & Luk
  • 117. Anonymizing Health Data Meeting Thresholds: k-anonymity? Khaled El Emam & Luk
  • 118. Anonymizing Health Data De-identifying Under Complete Knowledge Khaled El Emam & Luk
  • 119. Anonymizing Health Data De-identifying Under Complete Knowledge Khaled El Emam & Luk
  • 120. Anonymizing Health Data De-identifying Under Complete Knowledge Khaled El Emam & Luk
  • 121. Anonymizing Health Data De-identifying Under Complete Knowledge Khaled El Emam & Luk
  • 122. Anonymizing Health Data State Inpatient Database (SID) of California Khaled El Emam & Luk
  • 123. Anonymizing Health Data State Inpatient Database (SID) of California Khaled El Emam & Luk Researcher Ronnie wants public data!
  • 124. Anonymizing Health Data State Inpatient Database (SID) of California Khaled El Emam & Luk Researcher Ronnie wants public data!
  • 125. Anonymizing Health Data State Inpatient Database (SID) of California Khaled El Emam & Luk
  • 126. Anonymizing Health Data Measuring Risk Under Plausible Attacks Khaled El Emam & Luk
  • 127. Anonymizing Health Data T1:Deliberate Attempt Measuring Risk Under Plausible Attacks Khaled El Emam & Luk T2: Inadvertent Attempt (“Holly Smokes, I know her!”) T3: Data Breach (“data gone wild”) T4: Public Data (demonstration attack) Pr(re-id) ≤ 0.09 (maximum risk)
  • 128. Anonymizing Health Data De-identifying the Data Set Khaled El Emam & Luk
  • 129. Anonymizing Health Data De-identifying the Data Set Khaled El Emam & Luk BirthYear in 5-yy (cut at 1910-); AdmissionYear unchanged; DaysSinceLastService in 28-dd (cut at 7-, 182+); LengthOfStay same as DaysSinceLastService.
  • 130. Anonymizing Health Data De-identifying the Data Set Khaled El Emam & Luk BirthYear in 5-yy (cut at 1910-); AdmissionYear unchanged; DaysSinceLastService in 28-dd (cut at 7-, 182+); LengthOfStay same as DaysSinceLastService.
  • 131. Anonymizing Health Data Connected Variables Khaled El Emam & Luk
  • 132. Anonymizing Health Data Connected Variables Khaled El Emam & Luk QI to QI
  • 133. Anonymizing Health Data Connected Variables Khaled El Emam & Luk QI to QI Similar QI? Same generalization and suppression.
  • 134. Anonymizing Health Data Connected Variables Khaled El Emam & Luk QI to QI Similar QI? Same generalization and suppression. QI to non-QI
  • 135. Anonymizing Health Data Connected Variables Khaled El Emam & Luk QI to QI Similar QI? Same generalization and suppression. QI to non-QI Non-QI is revealing? Same suppression so both are removed.
  • 136. Anonymizing Health Data Other Issues Regarding Longitudinal Data Khaled El Emam & Luk
  • 137. Anonymizing Health Data Other Issues Regarding Longitudinal Data Khaled El Emam & Luk Date shifting—maintaining order of records.
  • 138. Anonymizing Health Data Other Issues Regarding Longitudinal Data Khaled El Emam & Luk Date shifting—maintaining order of records. Long tails—truncation of records.
  • 139. Anonymizing Health Data Other Issues Regarding Longitudinal Data Khaled El Emam & Luk Date shifting—maintaining order of records. Long tails—truncation of records. Adversary power—assumption of knowledge.
  • 140. Anonymizing Health Data Other Concerns to Think About Khaled El Emam & Luk
  • 141. Anonymizing Health Data Other Concerns to Think About Khaled El Emam & Luk Free-form text—anonymization.
  • 142. Anonymizing Health Data Other Concerns to Think About Khaled El Emam & Luk Free-form text—anonymization. Geospatial information—aggregation and geoproxy risk.
  • 143. Anonymizing Health Data Other Concerns to Think About Khaled El Emam & Luk Free-form text—anonymization. Geospatial information—aggregation and geoproxy risk. Medical codes—generalization, suppression, shuffling (yes, as in cards).
  • 144. Anonymizing Health Data Other Concerns to Think About Khaled El Emam & Luk Free-form text—anonymization. Geospatial information—aggregation and geoproxy risk. Medical codes—generalization, suppression, shuffling (yes, as in cards). Secure linking—linking data through encryption before anonymization.
  • 145. Anonymizing Health Data Part 3 of Webcast: Questions and Answers Khaled El Emam & Luk
  • 146. Anonymizing Health Data Khaled El Emam & Luk More Comments or Questions: Contact us!
  • 147. Anonymizing Health Data Khaled El Emam & Luk Khaled El Emam: kelemam@privacyanalytics.ca Luk Arbuckle: larbuckle@privacyanalytics.ca More Comments or Questions: Contact us!

Editor's Notes

  1. A risk-based methodology is consistent with contemporary standards from regulators and governments, and is the approach we present in our book.
  2. This is where things get heavy. We’ll start with some basic principles.
  3. The Goldilocks Principle: the trade-off between perfect data and perfect privacy.
  4. We use masking for direct identifiers, and de-identification for indirect identifiers.
  5. Masking
  6. De-identification
  7. Yahoo!
  8. From a regulatory perspective, it’s important to document the process that was used to de-identify the data set, as well as the results of enacting that process.
  9. From a regulatory perspective, it’s important to document the process that was used to de-identify the data set, as well as the results of enacting that process.
  10. From a regulatory perspective, it’s important to document the process that was used to de-identify the data set, as well as the results of enacting that process.
  11. The probability of anattack will depend on the controls in place to manage the data (mitigating controls).
  12. On average people tend to have 150 friends. This is called the Dunbar number.
  13. Based on recent credible evidence, we know that approximately 27% of providers that are supposed to follow the HIPAA Security Rule have a reportable breach every year.
  14. We assume that there is an adversary who has background information that can be used to launch an attack.
  15. So we can measure risk under plausible attacks, but how to we set an overall risk threshold?
  16. Max risk is based on the record that has the highest probability of re-identification; average risk when the adversary is trying to re-identify someone they know or all everyone in data set.
  17. To set the threshold, we can look at the sensitivity of the data and the consent mechanism that was in place (invasion of privacy).
  18. The data he wants...
  19. The data he wants...
  20. Based on detailed risk assessment.
  21. Worse case is 2008, prevalence of 0.027.
  22. 150/2 friends because only women considered.
  23. K=11
  24. Approximate complete knowledge
  25. Approximate complete knowledge
  26. Approximate complete knowledge
  27. Approximate complete knowledge
  28. Approximate complete knowledge
  29. Approximate complete knowledge
  30. Approximate complete knowledge
  31. Approximate complete knowledge
  32. Approximate complete knowledge
  33. Approximate complete knowledge