SlideShare a Scribd company logo
Health data and the re-
identification threat – a real
world example
Giske Ursin
Cancer Registry of Norway
March 5, 2018
Seminar om privacy-preserving distributed
statistical computation,
Statistics Norway
Norwegian health registries
17 central + 54 clinical registries
Purpose:
- Asess distribution of disease
- Obtain information on how to prevent disease and
death from disease
Other health data:
- Population surveys
- 360+ biobanks
Cancer screening programs: all women 25-69
All these data…..
Let’s
link
them
30/04/20
18
Put the data somewhere safe…
Can only access them there….
But….
Kreft i Norge 20151. Are the data safe?
National platform
coming….
1. Are the data really SAFE?
http://www.free-bullion-investment-guide.com/homesafes.html
Kreft i Norge 2015
2. Does it matter?
Reidentification threat
Trust
…versus…..
Current systems are based on trust
An example
An example
Month and year of birth
Dates of all cervical exams
Results of each test
Whether or not get cancer
Cancer diagnosis date
1 million women
An example
Month and year of birth
Dates of all cervical exams
Results of each test
Whether or not get cancer
Cancer diagnosis date
1 million women
Month and year of birth
Dates of all cervical exams
Results of each test
Linked to identifiers
on n = xxx women
What do we do?
Trust?
All data deliveries based on trust
….or
What do we do?
Reduce
reidentification threat
………………Exactly HOW??
Anonymization protocols
K-anonymization (categorizing variables)
Creating synthetic datasets
Fuzzification
Synthetic datasets
Reset all dates from reference date
Day of birth = day 0
Day started using drug before diagnosis) = day 19 345
Day diagnosed with cancer = day 20 693
Challenge:
If need some aspect of calendar year
(treatments change)
Fuzzification – alter the data
- K-anonymization (Categorized variables)
- Excluded some observations (extreme dates/combinations)
- ALTERED all dates:
- Removed DAY
- CHANGED month – with random number (fuzzy factor)
- REMOVED month of birth
Fuzzification of cervix data
Ursin et al., Cancer Epidemiology Biomarkers Prevention 2017
Fuzzification – alter the data
• 5,6 million records
• All cervical exam dates
• Results
• Diagnosis dates of cancer
• 915 000 women
Ursin et al., Cancer Epidemiology Biomarkers Prevention 2017
Fuzzification – alter the data
• Removed extreme dates/combinations
• Set day in dates to 15
• Used fuzzy factor on month:
• random value between -4 and +4
• All dates one individual changed with same
random number
Ursin et al., Cancer Epidemiology Biomarkers Prevention 2017
Original ID DOB Exam 1 Exam 2 Diagnosis date
01071972 23456 1/7/1972 2/8/2000 10/11/2004 21/1/2007
03041960 45678 3/4/1960 5/1/1995 10/2/1998 ----
ID DOB Exam 1 Exam 2 Diagnosis date
001 15/7/1972 15/8/2000 15/11/2004 15/1/2007
002 15/4/1960 15/1/1995 15/2/1998 ----
Allocated ID DOB Exam1 Exam2 Diagnosis date
1023 15/10/1972 15/11/2000 15/2/2005 15/4/2007
4567 15/1/1960 15/11/1994 15/12/1997 ---
Fuzzification – alter the data
Allocated ID DOB Exam1 Exam2 Diagnosis date
1023 1972 15/11/2000 15/2/2005 15/4/2007
4567 1960 15/11/1994 15/12/1997 ---
FINAL DATA
Original ID DOB Exam 1 Exam 2 Diagnosis date
01071972 23456 1/7/1972 2/8/2000 10/11/2004 21/1/2007
03041960 45678 3/4/1960 5/1/1995 10/2/1998 ----
Fuzzification – alter the data
Allocated ID DOB Exam1 Exam2 Diagnosis date
1023 1972 15/11/2000 15/2/2005 15/4/2007
4567 1960 15/11/1994 15/12/1997 ---
FINAL DATA
Assessing the risk of reidentification
• ARX tool
• Quantifies risk of re-identification based on
uniqueness
• Prosecutor scenario: assumes person in dataset
• Classify variables as identiable, quasi-identifiable
or sensitive
Prasser F, Kohlmayer F, Lautenschlager R, Kuhn KA. ARX--A
Comprehensive Tool for Anonymizing Biomedical Data. AMIA Annu
Symp Proc. 2014;2014:984-93.
Assessing the reidentification risk
• D1. Realistic dataset
• D2. k-anonymization of dataset D1
• changing all dates in the dataset to 15th of the month
• D3. Fuzzifying the month in D2
• by adding a random factor between -4 to +4 months to each
month.
Fuzzification – WHAT helps?
Fuzzification – WHAT helps?
Fuzzification – WHAT helps?
Ursin et al., Cancer Epidemiology Biomarkers Prevention 2017
Reidentification risk
• Simple step reduces the risk of reidentification
• Adding a fuzzy factor makes reidentification even
more difficult
Graden av personidentifikasjon skal ikke være større enn
nødvendig for det aktuelle formålet. Graden av
personidentifikasjon skal begrunnes. Tilsynsmyndigheten kan
kreve at den databehandlingsansvarlige legger frem
begrunnelsen.
• Helseregisterloven §6
Current regulations
EU – GDPR:
Data Protection Impact Assessment
Article 35
Current practice - examples
Cancer Registry: Restrictive with dates
Helseregisterloven §6
Prescription Registry: Restrictive with dates
§4 «Forbud mot samtidig tilgang»
(Differansedager = synthetic dataset)
Statistics Norway: ?
Common guidelines - and
better solutions - needed!
Income?
Large linkages continue
…..still based on trust
Can NOT build a national platform on TRUST alone
For the researchers…….
BALANCE
The researchers need:
Safe analysis
of large linked data
(no reidentification threat)
- Rapid and seamless analyses
- Ability to check individual records
Need national platforms that can do it all!
Thank you
Fuzzy paper:
Mari Nygård
Sagar Sen
Jean-Marie Mottu
Discussions with:
Jan Nygård
Bjørn Møller
Hilde Olav
Johanne Gulbrandsen
Datautleveringsenheten
Livmorhalsprogrammet

More Related Content

What's hot

Towards a National Learning Health System - Aziz Sheikh
Towards a National Learning Health System - Aziz SheikhTowards a National Learning Health System - Aziz Sheikh
Towards a National Learning Health System - Aziz SheikhNIHR CLAHRC West Midlands
 
Eileen Hutton TALMOR Do we drive faster in canada
Eileen Hutton TALMOR Do we drive faster in canadaEileen Hutton TALMOR Do we drive faster in canada
Eileen Hutton TALMOR Do we drive faster in canadatalmorbv
 
Facilitating Analytics while Protecting Privacy
Facilitating Analytics while Protecting PrivacyFacilitating Analytics while Protecting Privacy
Facilitating Analytics while Protecting PrivacyKhaled El Emam
 
A Small Biotech/Pharma Company with Great Potentials...Novavax's Conference P...
A Small Biotech/Pharma Company with Great Potentials...Novavax's Conference P...A Small Biotech/Pharma Company with Great Potentials...Novavax's Conference P...
A Small Biotech/Pharma Company with Great Potentials...Novavax's Conference P...DrMuni Neurophysiologist
 
EHR Poster 4-11-16
EHR Poster 4-11-16EHR Poster 4-11-16
EHR Poster 4-11-16Larry Liu
 
Annual2018 kuwait newborn screening
Annual2018 kuwait newborn screening Annual2018 kuwait newborn screening
Annual2018 kuwait newborn screening Newborn Screening KW
 
Anti-retroviral therapy in HIV-positive pregnant women and children
Anti-retroviral therapy in HIV-positive pregnant women and childrenAnti-retroviral therapy in HIV-positive pregnant women and children
Anti-retroviral therapy in HIV-positive pregnant women and childrenZeena Nackerdien
 
Dr. Christopher Braden - The NIAA Effort: Learning from the June Roundtable
Dr. Christopher Braden - The NIAA Effort: Learning from the June RoundtableDr. Christopher Braden - The NIAA Effort: Learning from the June Roundtable
Dr. Christopher Braden - The NIAA Effort: Learning from the June RoundtableJohn Blue
 

What's hot (12)

Sepsis and Septic shock
Sepsis and Septic shock Sepsis and Septic shock
Sepsis and Septic shock
 
Towards a National Learning Health System - Aziz Sheikh
Towards a National Learning Health System - Aziz SheikhTowards a National Learning Health System - Aziz Sheikh
Towards a National Learning Health System - Aziz Sheikh
 
Eileen Hutton TALMOR Do we drive faster in canada
Eileen Hutton TALMOR Do we drive faster in canadaEileen Hutton TALMOR Do we drive faster in canada
Eileen Hutton TALMOR Do we drive faster in canada
 
Facilitating Analytics while Protecting Privacy
Facilitating Analytics while Protecting PrivacyFacilitating Analytics while Protecting Privacy
Facilitating Analytics while Protecting Privacy
 
A Small Biotech/Pharma Company with Great Potentials...Novavax's Conference P...
A Small Biotech/Pharma Company with Great Potentials...Novavax's Conference P...A Small Biotech/Pharma Company with Great Potentials...Novavax's Conference P...
A Small Biotech/Pharma Company with Great Potentials...Novavax's Conference P...
 
EHR Poster 4-11-16
EHR Poster 4-11-16EHR Poster 4-11-16
EHR Poster 4-11-16
 
Annual2018 kuwait newborn screening
Annual2018 kuwait newborn screening Annual2018 kuwait newborn screening
Annual2018 kuwait newborn screening
 
Anti-retroviral therapy in HIV-positive pregnant women and children
Anti-retroviral therapy in HIV-positive pregnant women and childrenAnti-retroviral therapy in HIV-positive pregnant women and children
Anti-retroviral therapy in HIV-positive pregnant women and children
 
Transitions successful practices
Transitions successful practicesTransitions successful practices
Transitions successful practices
 
Dr. Christopher Braden - The NIAA Effort: Learning from the June Roundtable
Dr. Christopher Braden - The NIAA Effort: Learning from the June RoundtableDr. Christopher Braden - The NIAA Effort: Learning from the June Roundtable
Dr. Christopher Braden - The NIAA Effort: Learning from the June Roundtable
 
Public health Surveillance
Public health SurveillancePublic health Surveillance
Public health Surveillance
 
Newborn Screening Programs in Utah
Newborn Screening Programs in UtahNewborn Screening Programs in Utah
Newborn Screening Programs in Utah
 

Similar to BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Computation

Information Governance And Cancer Intelligence V1 0
Information Governance And Cancer Intelligence V1 0Information Governance And Cancer Intelligence V1 0
Information Governance And Cancer Intelligence V1 0michael_ncin
 
Anurati Mathur & Propeller Health @ Madison's Big Data Meetup
Anurati Mathur & Propeller Health @ Madison's Big Data MeetupAnurati Mathur & Propeller Health @ Madison's Big Data Meetup
Anurati Mathur & Propeller Health @ Madison's Big Data MeetupAnurati Mathur
 
Dr. Martin Bardsley Digital Health Assembly 2015
Dr. Martin Bardsley Digital Health Assembly 2015Dr. Martin Bardsley Digital Health Assembly 2015
Dr. Martin Bardsley Digital Health Assembly 2015DHA2015
 
Atul Butte's presentation at #AMIA2021 for the Knowledge Discovery and Data M...
Atul Butte's presentation at #AMIA2021 for the Knowledge Discovery and Data M...Atul Butte's presentation at #AMIA2021 for the Knowledge Discovery and Data M...
Atul Butte's presentation at #AMIA2021 for the Knowledge Discovery and Data M...University of California, San Francisco
 
Precision Medicine in the Age of NCI MATCH and the Beau Biden Cancer Moonshot
Precision Medicine in the Age of NCI MATCH and the Beau Biden Cancer MoonshotPrecision Medicine in the Age of NCI MATCH and the Beau Biden Cancer Moonshot
Precision Medicine in the Age of NCI MATCH and the Beau Biden Cancer MoonshotWarren Kibbe
 
CCDI Kibbe Wake Forest University Dec 2023.pptx
CCDI Kibbe Wake Forest University Dec 2023.pptxCCDI Kibbe Wake Forest University Dec 2023.pptx
CCDI Kibbe Wake Forest University Dec 2023.pptxWarren Kibbe
 
Health IT Summit Beverly Hills 2014 – Case Study “The Progression of Predicti...
Health IT Summit Beverly Hills 2014 – Case Study “The Progression of Predicti...Health IT Summit Beverly Hills 2014 – Case Study “The Progression of Predicti...
Health IT Summit Beverly Hills 2014 – Case Study “The Progression of Predicti...Health IT Conference – iHT2
 
Duke Industry Statistics Symposium - Real world evidence , EHRs and Cancer S...
Duke Industry Statistics Symposium -  Real world evidence , EHRs and Cancer S...Duke Industry Statistics Symposium -  Real world evidence , EHRs and Cancer S...
Duke Industry Statistics Symposium - Real world evidence , EHRs and Cancer S...Warren Kibbe
 
How to Use Data to Improve Patient Safety: A Two-Part Discussion
How to Use Data to Improve Patient Safety: A Two-Part DiscussionHow to Use Data to Improve Patient Safety: A Two-Part Discussion
How to Use Data to Improve Patient Safety: A Two-Part DiscussionHealth Catalyst
 
Applied use of CUSUMs in surveillance
Applied use of CUSUMs in surveillanceApplied use of CUSUMs in surveillance
Applied use of CUSUMs in surveillanceNuffield Trust
 

Similar to BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Computation (20)

Information Governance And Cancer Intelligence V1 0
Information Governance And Cancer Intelligence V1 0Information Governance And Cancer Intelligence V1 0
Information Governance And Cancer Intelligence V1 0
 
Anurati Mathur & Propeller Health @ Madison's Big Data Meetup
Anurati Mathur & Propeller Health @ Madison's Big Data MeetupAnurati Mathur & Propeller Health @ Madison's Big Data Meetup
Anurati Mathur & Propeller Health @ Madison's Big Data Meetup
 
Dr. Martin Bardsley Digital Health Assembly 2015
Dr. Martin Bardsley Digital Health Assembly 2015Dr. Martin Bardsley Digital Health Assembly 2015
Dr. Martin Bardsley Digital Health Assembly 2015
 
Risk Clinic Module of HughesRiskApps
Risk Clinic Module of HughesRiskApps Risk Clinic Module of HughesRiskApps
Risk Clinic Module of HughesRiskApps
 
Risk Clinic Module of HughesRiskApps
Risk Clinic Module of HughesRiskAppsRisk Clinic Module of HughesRiskApps
Risk Clinic Module of HughesRiskApps
 
Big data sharing
Big data sharingBig data sharing
Big data sharing
 
Atul Butte's presentation at #AMIA2021 for the Knowledge Discovery and Data M...
Atul Butte's presentation at #AMIA2021 for the Knowledge Discovery and Data M...Atul Butte's presentation at #AMIA2021 for the Knowledge Discovery and Data M...
Atul Butte's presentation at #AMIA2021 for the Knowledge Discovery and Data M...
 
Precision Medicine in the Age of NCI MATCH and the Beau Biden Cancer Moonshot
Precision Medicine in the Age of NCI MATCH and the Beau Biden Cancer MoonshotPrecision Medicine in the Age of NCI MATCH and the Beau Biden Cancer Moonshot
Precision Medicine in the Age of NCI MATCH and the Beau Biden Cancer Moonshot
 
HSCIC Data Linkage Stakeholder Forum Nov 2013: The Data Linkage and Extract S...
HSCIC Data Linkage Stakeholder Forum Nov 2013: The Data Linkage and Extract S...HSCIC Data Linkage Stakeholder Forum Nov 2013: The Data Linkage and Extract S...
HSCIC Data Linkage Stakeholder Forum Nov 2013: The Data Linkage and Extract S...
 
CCDI Kibbe Wake Forest University Dec 2023.pptx
CCDI Kibbe Wake Forest University Dec 2023.pptxCCDI Kibbe Wake Forest University Dec 2023.pptx
CCDI Kibbe Wake Forest University Dec 2023.pptx
 
PreNatal Module of HughesRiskApps
PreNatal Module of HughesRiskAppsPreNatal Module of HughesRiskApps
PreNatal Module of HughesRiskApps
 
PreNatal Module, HughesRiskApps
PreNatal Module, HughesRiskAppsPreNatal Module, HughesRiskApps
PreNatal Module, HughesRiskApps
 
CLQ Overview Deck
CLQ Overview DeckCLQ Overview Deck
CLQ Overview Deck
 
The challenges of zika: a health IT response
The challenges of zika: a health IT responseThe challenges of zika: a health IT response
The challenges of zika: a health IT response
 
Health IT Summit Beverly Hills 2014 – Case Study “The Progression of Predicti...
Health IT Summit Beverly Hills 2014 – Case Study “The Progression of Predicti...Health IT Summit Beverly Hills 2014 – Case Study “The Progression of Predicti...
Health IT Summit Beverly Hills 2014 – Case Study “The Progression of Predicti...
 
Duke Industry Statistics Symposium - Real world evidence , EHRs and Cancer S...
Duke Industry Statistics Symposium -  Real world evidence , EHRs and Cancer S...Duke Industry Statistics Symposium -  Real world evidence , EHRs and Cancer S...
Duke Industry Statistics Symposium - Real world evidence , EHRs and Cancer S...
 
How to Use Data to Improve Patient Safety: A Two-Part Discussion
How to Use Data to Improve Patient Safety: A Two-Part DiscussionHow to Use Data to Improve Patient Safety: A Two-Part Discussion
How to Use Data to Improve Patient Safety: A Two-Part Discussion
 
Applied use of CUSUMs in surveillance
Applied use of CUSUMs in surveillanceApplied use of CUSUMs in surveillance
Applied use of CUSUMs in surveillance
 
Power to the Patient
Power to the PatientPower to the Patient
Power to the Patient
 
SgtSaraEdition
SgtSaraEditionSgtSaraEdition
SgtSaraEdition
 

More from Statistisk sentralbyrå

Den europeiske studentundersøkelsen 2018
Den europeiske studentundersøkelsen 2018Den europeiske studentundersøkelsen 2018
Den europeiske studentundersøkelsen 2018Statistisk sentralbyrå
 
Befolkningsframskrivingene 2018, seminar 26. juni
Befolkningsframskrivingene 2018, seminar 26. juniBefolkningsframskrivingene 2018, seminar 26. juni
Befolkningsframskrivingene 2018, seminar 26. juniStatistisk sentralbyrå
 
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...Statistisk sentralbyrå
 
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...Statistisk sentralbyrå
 
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...Statistisk sentralbyrå
 
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...Statistisk sentralbyrå
 
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...Statistisk sentralbyrå
 
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...Statistisk sentralbyrå
 
Innvandrere i Norge 2017, presentasjon fra frokostseminar 11.12.2017
Innvandrere i Norge 2017, presentasjon fra frokostseminar 11.12.2017Innvandrere i Norge 2017, presentasjon fra frokostseminar 11.12.2017
Innvandrere i Norge 2017, presentasjon fra frokostseminar 11.12.2017Statistisk sentralbyrå
 
Presentasjon rapport: Levekår blant innvandrere i Norge 2016
Presentasjon rapport: Levekår blant innvandrere i Norge 2016Presentasjon rapport: Levekår blant innvandrere i Norge 2016
Presentasjon rapport: Levekår blant innvandrere i Norge 2016Statistisk sentralbyrå
 
SSB: Fagseminar om innvandring og inntektsutvikling 16. mars 2017
SSB: Fagseminar om innvandring og inntektsutvikling 16. mars 2017 SSB: Fagseminar om innvandring og inntektsutvikling 16. mars 2017
SSB: Fagseminar om innvandring og inntektsutvikling 16. mars 2017 Statistisk sentralbyrå
 
Flyktninger i Norge, presentasjoner fra seminar 14. desember 2016
Flyktninger i Norge, presentasjoner fra seminar 14. desember 2016Flyktninger i Norge, presentasjoner fra seminar 14. desember 2016
Flyktninger i Norge, presentasjoner fra seminar 14. desember 2016Statistisk sentralbyrå
 
Flyktninger bosatt i Norge: Hvem er de, og hvordan går det med dem?
Flyktninger bosatt i Norge: Hvem er de, og hvordan går det med dem?Flyktninger bosatt i Norge: Hvem er de, og hvordan går det med dem?
Flyktninger bosatt i Norge: Hvem er de, og hvordan går det med dem?Statistisk sentralbyrå
 

More from Statistisk sentralbyrå (20)

Den europeiske studentundersøkelsen 2018
Den europeiske studentundersøkelsen 2018Den europeiske studentundersøkelsen 2018
Den europeiske studentundersøkelsen 2018
 
Befolkningsframskrivingene 2018, seminar 26. juni
Befolkningsframskrivingene 2018, seminar 26. juniBefolkningsframskrivingene 2018, seminar 26. juni
Befolkningsframskrivingene 2018, seminar 26. juni
 
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
 
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
 
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
 
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
 
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
 
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
 
Innvandrere i Norge 2017, presentasjon fra frokostseminar 11.12.2017
Innvandrere i Norge 2017, presentasjon fra frokostseminar 11.12.2017Innvandrere i Norge 2017, presentasjon fra frokostseminar 11.12.2017
Innvandrere i Norge 2017, presentasjon fra frokostseminar 11.12.2017
 
SSBs API mot Statistikkbanken
SSBs API mot StatistikkbankenSSBs API mot Statistikkbanken
SSBs API mot Statistikkbanken
 
Norsk kulturbarometer 2016
Norsk kulturbarometer 2016Norsk kulturbarometer 2016
Norsk kulturbarometer 2016
 
Presentasjon rapport: Levekår blant innvandrere i Norge 2016
Presentasjon rapport: Levekår blant innvandrere i Norge 2016Presentasjon rapport: Levekår blant innvandrere i Norge 2016
Presentasjon rapport: Levekår blant innvandrere i Norge 2016
 
SSB: Fagseminar om innvandring og inntektsutvikling 16. mars 2017
SSB: Fagseminar om innvandring og inntektsutvikling 16. mars 2017 SSB: Fagseminar om innvandring og inntektsutvikling 16. mars 2017
SSB: Fagseminar om innvandring og inntektsutvikling 16. mars 2017
 
Flyktninger i Norge, presentasjoner fra seminar 14. desember 2016
Flyktninger i Norge, presentasjoner fra seminar 14. desember 2016Flyktninger i Norge, presentasjoner fra seminar 14. desember 2016
Flyktninger i Norge, presentasjoner fra seminar 14. desember 2016
 
SSBs API mot Statistikkbanken
SSBs API mot StatistikkbankenSSBs API mot Statistikkbanken
SSBs API mot Statistikkbanken
 
Flyktninger bosatt i Norge: Hvem er de, og hvordan går det med dem?
Flyktninger bosatt i Norge: Hvem er de, og hvordan går det med dem?Flyktninger bosatt i Norge: Hvem er de, og hvordan går det med dem?
Flyktninger bosatt i Norge: Hvem er de, og hvordan går det med dem?
 
Hva vet vi om verdens flyktninger?
Hva vet vi om verdens flyktninger?Hva vet vi om verdens flyktninger?
Hva vet vi om verdens flyktninger?
 
4. Hva vet vi om verdens flyktninger?
4. Hva vet vi om verdens flyktninger?4. Hva vet vi om verdens flyktninger?
4. Hva vet vi om verdens flyktninger?
 
Hva vet vi om verdens flyktninger?
Hva vet vi om verdens flyktninger?Hva vet vi om verdens flyktninger?
Hva vet vi om verdens flyktninger?
 
Hva vet vi om verdens flyktninger?
Hva vet vi om verdens flyktninger?Hva vet vi om verdens flyktninger?
Hva vet vi om verdens flyktninger?
 

Recently uploaded

一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单nscud
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单ewymefz
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单ewymefz
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单vcaxypu
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBAlireza Kamrani
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsCEPTES Software Inc
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxbenishzehra469
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJames Polillo
 
Computer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sComputer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sMAQIB18
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .NABLAS株式会社
 
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...Domenico Conte
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单yhkoc
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?DOT TECH
 
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...elinavihriala
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP
 
Uber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis ReportUber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis ReportSatyamNeelmani2
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单ocavb
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单ukgaet
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单vcaxypu
 
Introduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxxIntroduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxxzahraomer517
 

Recently uploaded (20)

一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDB
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
Computer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sComputer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage s
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?
 
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Uber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis ReportUber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis Report
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
Introduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxxIntroduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxx
 

BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Computation

  • 1. Health data and the re- identification threat – a real world example Giske Ursin Cancer Registry of Norway March 5, 2018 Seminar om privacy-preserving distributed statistical computation, Statistics Norway
  • 2. Norwegian health registries 17 central + 54 clinical registries Purpose: - Asess distribution of disease - Obtain information on how to prevent disease and death from disease Other health data: - Population surveys - 360+ biobanks Cancer screening programs: all women 25-69
  • 4. 30/04/20 18 Put the data somewhere safe… Can only access them there…. But….
  • 5. Kreft i Norge 20151. Are the data safe? National platform coming….
  • 6. 1. Are the data really SAFE? http://www.free-bullion-investment-guide.com/homesafes.html
  • 7. Kreft i Norge 2015 2. Does it matter? Reidentification threat Trust …versus…..
  • 8. Current systems are based on trust
  • 10. An example Month and year of birth Dates of all cervical exams Results of each test Whether or not get cancer Cancer diagnosis date 1 million women
  • 11. An example Month and year of birth Dates of all cervical exams Results of each test Whether or not get cancer Cancer diagnosis date 1 million women Month and year of birth Dates of all cervical exams Results of each test Linked to identifiers on n = xxx women
  • 12. What do we do? Trust? All data deliveries based on trust ….or
  • 13. What do we do? Reduce reidentification threat ………………Exactly HOW??
  • 14. Anonymization protocols K-anonymization (categorizing variables) Creating synthetic datasets Fuzzification
  • 15. Synthetic datasets Reset all dates from reference date Day of birth = day 0 Day started using drug before diagnosis) = day 19 345 Day diagnosed with cancer = day 20 693 Challenge: If need some aspect of calendar year (treatments change)
  • 16. Fuzzification – alter the data - K-anonymization (Categorized variables) - Excluded some observations (extreme dates/combinations) - ALTERED all dates: - Removed DAY - CHANGED month – with random number (fuzzy factor) - REMOVED month of birth
  • 17. Fuzzification of cervix data Ursin et al., Cancer Epidemiology Biomarkers Prevention 2017
  • 18. Fuzzification – alter the data • 5,6 million records • All cervical exam dates • Results • Diagnosis dates of cancer • 915 000 women Ursin et al., Cancer Epidemiology Biomarkers Prevention 2017
  • 19. Fuzzification – alter the data • Removed extreme dates/combinations • Set day in dates to 15 • Used fuzzy factor on month: • random value between -4 and +4 • All dates one individual changed with same random number Ursin et al., Cancer Epidemiology Biomarkers Prevention 2017
  • 20. Original ID DOB Exam 1 Exam 2 Diagnosis date 01071972 23456 1/7/1972 2/8/2000 10/11/2004 21/1/2007 03041960 45678 3/4/1960 5/1/1995 10/2/1998 ---- ID DOB Exam 1 Exam 2 Diagnosis date 001 15/7/1972 15/8/2000 15/11/2004 15/1/2007 002 15/4/1960 15/1/1995 15/2/1998 ---- Allocated ID DOB Exam1 Exam2 Diagnosis date 1023 15/10/1972 15/11/2000 15/2/2005 15/4/2007 4567 15/1/1960 15/11/1994 15/12/1997 --- Fuzzification – alter the data Allocated ID DOB Exam1 Exam2 Diagnosis date 1023 1972 15/11/2000 15/2/2005 15/4/2007 4567 1960 15/11/1994 15/12/1997 --- FINAL DATA
  • 21. Original ID DOB Exam 1 Exam 2 Diagnosis date 01071972 23456 1/7/1972 2/8/2000 10/11/2004 21/1/2007 03041960 45678 3/4/1960 5/1/1995 10/2/1998 ---- Fuzzification – alter the data Allocated ID DOB Exam1 Exam2 Diagnosis date 1023 1972 15/11/2000 15/2/2005 15/4/2007 4567 1960 15/11/1994 15/12/1997 --- FINAL DATA
  • 22. Assessing the risk of reidentification • ARX tool • Quantifies risk of re-identification based on uniqueness • Prosecutor scenario: assumes person in dataset • Classify variables as identiable, quasi-identifiable or sensitive Prasser F, Kohlmayer F, Lautenschlager R, Kuhn KA. ARX--A Comprehensive Tool for Anonymizing Biomedical Data. AMIA Annu Symp Proc. 2014;2014:984-93.
  • 23. Assessing the reidentification risk • D1. Realistic dataset • D2. k-anonymization of dataset D1 • changing all dates in the dataset to 15th of the month • D3. Fuzzifying the month in D2 • by adding a random factor between -4 to +4 months to each month.
  • 26. Fuzzification – WHAT helps? Ursin et al., Cancer Epidemiology Biomarkers Prevention 2017
  • 27. Reidentification risk • Simple step reduces the risk of reidentification • Adding a fuzzy factor makes reidentification even more difficult
  • 28. Graden av personidentifikasjon skal ikke være større enn nødvendig for det aktuelle formålet. Graden av personidentifikasjon skal begrunnes. Tilsynsmyndigheten kan kreve at den databehandlingsansvarlige legger frem begrunnelsen. • Helseregisterloven §6 Current regulations EU – GDPR: Data Protection Impact Assessment Article 35
  • 29. Current practice - examples Cancer Registry: Restrictive with dates Helseregisterloven §6 Prescription Registry: Restrictive with dates §4 «Forbud mot samtidig tilgang» (Differansedager = synthetic dataset) Statistics Norway: ? Common guidelines - and better solutions - needed! Income?
  • 30. Large linkages continue …..still based on trust Can NOT build a national platform on TRUST alone
  • 32. The researchers need: Safe analysis of large linked data (no reidentification threat) - Rapid and seamless analyses - Ability to check individual records Need national platforms that can do it all!
  • 33. Thank you Fuzzy paper: Mari Nygård Sagar Sen Jean-Marie Mottu Discussions with: Jan Nygård Bjørn Møller Hilde Olav Johanne Gulbrandsen Datautleveringsenheten Livmorhalsprogrammet

Editor's Notes

  1. Vi har mye helsedata. Først og fremst mange helseregistre. 17 setnrale helseregistre: Fødselsregister, dødsårsak, norsk pasientregister, reseptregister, kreftregister osv. Så Kvalitetsregistre for ulike sykdommer. Data samlet inn for å kartlegge….. I tillegg andre helsedata