SlideShare a Scribd company logo
False promises of data anonymity
jeopardize data access
Neil Walker
12th September 2016
JDRF/Wellcome Trust Diabetes and Inflammation Laboratory
University of Cambridge
nmw24@cam.ac.uk
ORCID: http://orcid.org/0000-0001-9796-7688
Contents
In the context of clinical trial data, I’ll discuss
• The proposal to share individual-level data
• False promises of anonymity
• Consequences
• Alternatives
Neil Walker 2
The proposal
In a Jan 2016 editorial, the International Committee
of Medical Journal Editors (ICMJE)
“Proposes to require authors to share with others the
deidentified individual patient data (IPD) underlying the
results presented in the article.”
Implementation delayed a year1 to allow new
consent models
1. Post adoption
Neil Walker 3
… response to …
Institute of Medicine
(IoM) report (2015)
Neil Walker 4
… which includes …
(p144, my emphasis):
“De-identification is commonly used to protect the privacy
of participants in a clinical trial (see also Appendix B).
Various jurisdictions may differ on the degree to which the
risk of re-identification must be reduced for the data to be
considered sufficiently de-identified to justify more
widespread sharing, particularly in the absence of specific
informed consent of the data subjects.”
[Appendix B is 54 pages on "Concepts and Methods for De-identifying Clinical
Trial Data", by Khaled El Emam, and Bradley Malin]
Neil Walker 5
… and cites and relies on …
Neil Walker 6
… which is a poor implementation of …
Neil Walker 7
Why poor?
ISTDB-2 has 19435 participants, and 112 variables, e.g.:
Randomisation data;
HOSPNUM;Hospital number
RDELAY;Delay between stroke and randomisation in hours
RCONSC;Conscious state at randomisation (F - fully alert,
D - drowsy, U - unconscious)
SEX;"M=male; F=female"
AGE;Age in years
RSLEEP;Symptoms noted on waking (Y/N)
RATRIAL;"Atrial fibrillation (Y/N); not coded for pilot
phase - 984 patients"
...
COUNTRY;Abbreviated country code
CNTRYNUM;Country code
...
This should be enough people, right?
Neil Walker 8
Let’s count the people from each country
$ cut -f82 IST_corrected.txt | sort | uniq -c | sort –nr
6257 UK
3437 ITAL
1631 SWIT
759 POLA
...
9 JAPA
2 FRAN
1 COUNTRY
NB: dataset superseded by ISTDB-3, currently
emabargoed due to "UK NHS Information
Governance"
Neil Walker 9
Is this just isolated sloppiness?
And noting a released dataset
cannot be retrieved
Neil Walker 10
Examples - "Anecdata"
(from Daniel C Barth-Jones, to whom many thanks)
1. Governor Weld - identified in insurance dataset in 1997
2. Netflix - customers identified in a dataset released to improve
recommendations
3. Y-Chromosome STR surname inference -
demonstration from Yaniv Erlich's lab
4. PGP - subjects identified in (Open) Personal Genome Project
5. Washington State Hospital Discharge data -
patients identified in data sold by hospital
6. NYC Taxi - celebrities identified in FOIL request
7. Mobile phone - theoretical identification from mobile phone
location data
Neil Walker 11
Failure modes?
• 1. and 4. are cases where too much data was
released (Zipcodes, DOBs)
• 6. and 7. are breached by linking multiple
records, individually OK (probably) - though 6.
had a key hacked too
But all rely on data available outside the dataset to
make the (often small number of) identifications -
some of it not obvious
Neil Walker 12
So, de-identification hotly debated…
“There is no evidence that de-identification works either
in theory or in practice and attempts to quantify its
efficacy are unscientific and promote a false sense of
security by assuming unrealistic, artificially constrained
models of what an adversary might do.”
Neil Walker 13
How does this jeopardise data access?
• And not just
bad publicity,
though that
doesn’t help!
Image from Fast Company
Neil Walker 14
Data access issue #1:
where consent was not sought for data sharing
Data is being redacted e.g. from
https://clinicalstudydatarequest.com
GSK’s exclusion criteria includes:
Whether GSK consider it feasible to anonymise the data
without compromising the privacy and confidentiality of
research participants. For example, anonymisation of
data from studies of rare diseases is more difficult to
achieve and will be reviewed on a case-by-case basis.
Neil Walker 15
Data access issue #2: where there is no
experience of sharing data with consent
Neil Walker 16
Should have lots of choices
Where is clinical data sharing now?
Neil Walker 17
EBI and NIH like this …
Where should it be?
Neil Walker 18
AggregateConsented, anonymised
Understanding Society1, at UK Data Archive
Neil Walker 19
1. https://www.understandingsociety.ac.uk/documentation/getting-started
Downloads, 2014 3 285 2510
Datasets 2 29 3
Time to decision 3 months 2 weeks 1 day
Decision by DAC Staff, reporting
to DAC
Registration,
delegated by
DAC
i.e. some people do it well
Data access issue #3: no elegant way
to respond to a new attack
Neil Walker 20
This paper led to all genotype summary statistics being placed behind firewalls
Data access issue #4: people take risks
STOP PRESS - September 7th 2016
NHGRI give up on access control?
https://www.genome.gov/director/
https://www.genome.gov/27566089/Workshop-on-Sharing-Aggregate-Genomic-Data
“NHGRI should recommend that NIH reconsider the policy for
maintaining all genomic summary statistics under controlled
access, and develop a default public access model based on
transparent policy considerations for most genomics studies.”
Neil Walker 21
The elephant in the room?
Neil Walker 22
From Banksy’s Barely Legal
show, LA, 2006
Anonymous data is seen as a asset to
buy and sell
However not all subjects will agree to data sharing,
with a recent (health-data-related) poll finding 17%
“objected to private companies having
access to health data under any
circumstances.”
(Ipsos MORI 2016)
Neil Walker 23
So, to repeat: this is not a matter of
consent or anonymise
Do both
Neil Walker 24

More Related Content

Viewers also liked

Portfolio Rizza Power Point 1
Portfolio Rizza Power Point 1Portfolio Rizza Power Point 1
Portfolio Rizza Power Point 1
StevenKempner
 
Dessy Natalia CV (3)
Dessy Natalia CV  (3)Dessy Natalia CV  (3)
Dessy Natalia CV (3)
Dessy Natalia
 
Too cool for (law) school? Using technology to engage students in legal skills
Too cool for (law) school? Using technology to engage students in legal skillsToo cool for (law) school? Using technology to engage students in legal skills
Too cool for (law) school? Using technology to engage students in legal skills
Emily Allbon
 
Tuning in not zoning out: teaching students legal skills via a multimedia leg...
Tuning in not zoning out: teaching students legal skills via a multimedia leg...Tuning in not zoning out: teaching students legal skills via a multimedia leg...
Tuning in not zoning out: teaching students legal skills via a multimedia leg...
Emily Allbon
 
Academic law librarians: wallflowers or social butterflies?
Academic law librarians: wallflowers or social butterflies? Academic law librarians: wallflowers or social butterflies?
Academic law librarians: wallflowers or social butterflies?
Emily Allbon
 
Brand Audit - Target
Brand Audit - TargetBrand Audit - Target
Brand Audit - Target
Cheyonna Navarro
 
Legal research - getting started with Westlaw & Lexis
Legal research - getting started with Westlaw & LexisLegal research - getting started with Westlaw & Lexis
Legal research - getting started with Westlaw & Lexis
Emily Allbon
 

Viewers also liked (7)

Portfolio Rizza Power Point 1
Portfolio Rizza Power Point 1Portfolio Rizza Power Point 1
Portfolio Rizza Power Point 1
 
Dessy Natalia CV (3)
Dessy Natalia CV  (3)Dessy Natalia CV  (3)
Dessy Natalia CV (3)
 
Too cool for (law) school? Using technology to engage students in legal skills
Too cool for (law) school? Using technology to engage students in legal skillsToo cool for (law) school? Using technology to engage students in legal skills
Too cool for (law) school? Using technology to engage students in legal skills
 
Tuning in not zoning out: teaching students legal skills via a multimedia leg...
Tuning in not zoning out: teaching students legal skills via a multimedia leg...Tuning in not zoning out: teaching students legal skills via a multimedia leg...
Tuning in not zoning out: teaching students legal skills via a multimedia leg...
 
Academic law librarians: wallflowers or social butterflies?
Academic law librarians: wallflowers or social butterflies? Academic law librarians: wallflowers or social butterflies?
Academic law librarians: wallflowers or social butterflies?
 
Brand Audit - Target
Brand Audit - TargetBrand Audit - Target
Brand Audit - Target
 
Legal research - getting started with Westlaw & Lexis
Legal research - getting started with Westlaw & LexisLegal research - getting started with Westlaw & Lexis
Legal research - getting started with Westlaw & Lexis
 

Similar to Sci datacon neil-walker-2016-09-12

Data Con LA 2019 - Applied Privacy Engineering Study on SEER database by Ken ...
Data Con LA 2019 - Applied Privacy Engineering Study on SEER database by Ken ...Data Con LA 2019 - Applied Privacy Engineering Study on SEER database by Ken ...
Data Con LA 2019 - Applied Privacy Engineering Study on SEER database by Ken ...
Data Con LA
 
Methodologies for Addressing Privacy and Social Issues in Health Data: A Case...
Methodologies for Addressing Privacy and Social Issues in Health Data: A Case...Methodologies for Addressing Privacy and Social Issues in Health Data: A Case...
Methodologies for Addressing Privacy and Social Issues in Health Data: A Case...
Trilateral Research
 
20160719 23 Research Data Things
20160719 23 Research Data Things20160719 23 Research Data Things
20160719 23 Research Data Things
Katina Toufexis
 
Richard Corbridge, CIO at NIHR - Open data within clinical research environment
Richard Corbridge, CIO at NIHR - Open data within clinical research environmentRichard Corbridge, CIO at NIHR - Open data within clinical research environment
Richard Corbridge, CIO at NIHR - Open data within clinical research environment
Global Business Events
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data Management
Daniel JACOB
 
Lessons from the UK: Data access, patient trust & real-world impact with heal...
Lessons from the UK: Data access, patient trust & real-world impact with heal...Lessons from the UK: Data access, patient trust & real-world impact with heal...
Lessons from the UK: Data access, patient trust & real-world impact with heal...
Varsha Khodiyar
 
Presentation (1).pptx
Presentation (1).pptxPresentation (1).pptx
Presentation (1).pptx
Krishna20539
 
Real-World Evidence: The Future of Data Generation and Usage
Real-World Evidence: The Future of Data Generation and UsageReal-World Evidence: The Future of Data Generation and Usage
Real-World Evidence: The Future of Data Generation and Usage
April Bright
 
The Scientific and Technical Foundation for Altmetrics in the United States
The Scientific and Technical Foundation for Altmetrics in the United StatesThe Scientific and Technical Foundation for Altmetrics in the United States
The Scientific and Technical Foundation for Altmetrics in the United States
William Gunn
 
International perspective for sharing publicly funded medical research data
International perspective for sharing publicly funded medical research dataInternational perspective for sharing publicly funded medical research data
International perspective for sharing publicly funded medical research data
ARDC
 
Open Government Data & Privacy Protection
Open Government Data & Privacy ProtectionOpen Government Data & Privacy Protection
Open Government Data & Privacy Protection
Sylvia Ogweng
 
Data Science for (Health) Science: tales from a challenging front line, and h...
Data Science for (Health) Science:tales from a challenging front line, and h...Data Science for (Health) Science:tales from a challenging front line, and h...
Data Science for (Health) Science: tales from a challenging front line, and h...
Paolo Missier
 
Privacy Preserving for Mobile Health Data
Privacy Preserving for Mobile Health DataPrivacy Preserving for Mobile Health Data
Privacy Preserving for Mobile Health Data
IRJET Journal
 
Quæfacta Data Natives, Paris, May 15th, 2019
Quæfacta Data Natives, Paris, May 15th, 2019Quæfacta Data Natives, Paris, May 15th, 2019
Quæfacta Data Natives, Paris, May 15th, 2019
David Andrianavalontsalama
 
Data Natives Paris 2019
Data Natives Paris 2019Data Natives Paris 2019
Data Natives Paris 2019
Lea Dias
 
The age of analytics
The age of analyticsThe age of analytics
The age of analytics
bis_foresight
 
Review of Data Security, Consent and Opt-Outs
Review of Data Security, Consent and Opt-OutsReview of Data Security, Consent and Opt-Outs
Review of Data Security, Consent and Opt-Outs
Mohammad Al-Ubaydli
 
Clinical Data Models - The Hyve - Bio IT World April 2019
Clinical Data Models - The Hyve - Bio IT World April 2019Clinical Data Models - The Hyve - Bio IT World April 2019
Clinical Data Models - The Hyve - Bio IT World April 2019
Kees van Bochove
 
Legal and regulatory challenges to data sharing for clinical genetics and ge...
Legal and regulatory challenges to  data sharing for clinical genetics and ge...Legal and regulatory challenges to  data sharing for clinical genetics and ge...
Legal and regulatory challenges to data sharing for clinical genetics and ge...
Human Variome Project
 
0401 1 Denis Costello - Patient Generated Data
0401 1 Denis Costello - Patient Generated Data0401 1 Denis Costello - Patient Generated Data
0401 1 Denis Costello - Patient Generated Data
Workgroup of European Cancer Patient Advocacy Networks
 

Similar to Sci datacon neil-walker-2016-09-12 (20)

Data Con LA 2019 - Applied Privacy Engineering Study on SEER database by Ken ...
Data Con LA 2019 - Applied Privacy Engineering Study on SEER database by Ken ...Data Con LA 2019 - Applied Privacy Engineering Study on SEER database by Ken ...
Data Con LA 2019 - Applied Privacy Engineering Study on SEER database by Ken ...
 
Methodologies for Addressing Privacy and Social Issues in Health Data: A Case...
Methodologies for Addressing Privacy and Social Issues in Health Data: A Case...Methodologies for Addressing Privacy and Social Issues in Health Data: A Case...
Methodologies for Addressing Privacy and Social Issues in Health Data: A Case...
 
20160719 23 Research Data Things
20160719 23 Research Data Things20160719 23 Research Data Things
20160719 23 Research Data Things
 
Richard Corbridge, CIO at NIHR - Open data within clinical research environment
Richard Corbridge, CIO at NIHR - Open data within clinical research environmentRichard Corbridge, CIO at NIHR - Open data within clinical research environment
Richard Corbridge, CIO at NIHR - Open data within clinical research environment
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data Management
 
Lessons from the UK: Data access, patient trust & real-world impact with heal...
Lessons from the UK: Data access, patient trust & real-world impact with heal...Lessons from the UK: Data access, patient trust & real-world impact with heal...
Lessons from the UK: Data access, patient trust & real-world impact with heal...
 
Presentation (1).pptx
Presentation (1).pptxPresentation (1).pptx
Presentation (1).pptx
 
Real-World Evidence: The Future of Data Generation and Usage
Real-World Evidence: The Future of Data Generation and UsageReal-World Evidence: The Future of Data Generation and Usage
Real-World Evidence: The Future of Data Generation and Usage
 
The Scientific and Technical Foundation for Altmetrics in the United States
The Scientific and Technical Foundation for Altmetrics in the United StatesThe Scientific and Technical Foundation for Altmetrics in the United States
The Scientific and Technical Foundation for Altmetrics in the United States
 
International perspective for sharing publicly funded medical research data
International perspective for sharing publicly funded medical research dataInternational perspective for sharing publicly funded medical research data
International perspective for sharing publicly funded medical research data
 
Open Government Data & Privacy Protection
Open Government Data & Privacy ProtectionOpen Government Data & Privacy Protection
Open Government Data & Privacy Protection
 
Data Science for (Health) Science: tales from a challenging front line, and h...
Data Science for (Health) Science:tales from a challenging front line, and h...Data Science for (Health) Science:tales from a challenging front line, and h...
Data Science for (Health) Science: tales from a challenging front line, and h...
 
Privacy Preserving for Mobile Health Data
Privacy Preserving for Mobile Health DataPrivacy Preserving for Mobile Health Data
Privacy Preserving for Mobile Health Data
 
Quæfacta Data Natives, Paris, May 15th, 2019
Quæfacta Data Natives, Paris, May 15th, 2019Quæfacta Data Natives, Paris, May 15th, 2019
Quæfacta Data Natives, Paris, May 15th, 2019
 
Data Natives Paris 2019
Data Natives Paris 2019Data Natives Paris 2019
Data Natives Paris 2019
 
The age of analytics
The age of analyticsThe age of analytics
The age of analytics
 
Review of Data Security, Consent and Opt-Outs
Review of Data Security, Consent and Opt-OutsReview of Data Security, Consent and Opt-Outs
Review of Data Security, Consent and Opt-Outs
 
Clinical Data Models - The Hyve - Bio IT World April 2019
Clinical Data Models - The Hyve - Bio IT World April 2019Clinical Data Models - The Hyve - Bio IT World April 2019
Clinical Data Models - The Hyve - Bio IT World April 2019
 
Legal and regulatory challenges to data sharing for clinical genetics and ge...
Legal and regulatory challenges to  data sharing for clinical genetics and ge...Legal and regulatory challenges to  data sharing for clinical genetics and ge...
Legal and regulatory challenges to data sharing for clinical genetics and ge...
 
0401 1 Denis Costello - Patient Generated Data
0401 1 Denis Costello - Patient Generated Data0401 1 Denis Costello - Patient Generated Data
0401 1 Denis Costello - Patient Generated Data
 

Recently uploaded

Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
University of Hertfordshire
 
Anti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark UniverseAnti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark Universe
Sérgio Sacani
 
fermented food science of sauerkraut.pptx
fermented food science of sauerkraut.pptxfermented food science of sauerkraut.pptx
fermented food science of sauerkraut.pptx
ananya23nair
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
Sérgio Sacani
 
cathode ray oscilloscope and its applications
cathode ray oscilloscope and its applicationscathode ray oscilloscope and its applications
cathode ray oscilloscope and its applications
sandertein
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Leonel Morgado
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
by6843629
 
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
Scintica Instrumentation
 
Alternate Wetting and Drying - Climate Smart Agriculture
Alternate Wetting and Drying - Climate Smart AgricultureAlternate Wetting and Drying - Climate Smart Agriculture
Alternate Wetting and Drying - Climate Smart Agriculture
International Food Policy Research Institute- South Asia Office
 
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
eitps1506
 
Microbiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdfMicrobiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdf
sammy700571
 
Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)
Sciences of Europe
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
Leonel Morgado
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR
 
Pests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdfPests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdf
PirithiRaju
 
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Sérgio Sacani
 
Methods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdfMethods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdf
PirithiRaju
 
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
frank0071
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
vluwdy49
 
HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1
Shashank Shekhar Pandey
 

Recently uploaded (20)

Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
 
Anti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark UniverseAnti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark Universe
 
fermented food science of sauerkraut.pptx
fermented food science of sauerkraut.pptxfermented food science of sauerkraut.pptx
fermented food science of sauerkraut.pptx
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
 
cathode ray oscilloscope and its applications
cathode ray oscilloscope and its applicationscathode ray oscilloscope and its applications
cathode ray oscilloscope and its applications
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
 
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
 
Alternate Wetting and Drying - Climate Smart Agriculture
Alternate Wetting and Drying - Climate Smart AgricultureAlternate Wetting and Drying - Climate Smart Agriculture
Alternate Wetting and Drying - Climate Smart Agriculture
 
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
 
Microbiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdfMicrobiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdf
 
Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
 
Pests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdfPests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdf
 
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
 
Methods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdfMethods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdf
 
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
 
HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1
 

Sci datacon neil-walker-2016-09-12

  • 1. False promises of data anonymity jeopardize data access Neil Walker 12th September 2016 JDRF/Wellcome Trust Diabetes and Inflammation Laboratory University of Cambridge nmw24@cam.ac.uk ORCID: http://orcid.org/0000-0001-9796-7688
  • 2. Contents In the context of clinical trial data, I’ll discuss • The proposal to share individual-level data • False promises of anonymity • Consequences • Alternatives Neil Walker 2
  • 3. The proposal In a Jan 2016 editorial, the International Committee of Medical Journal Editors (ICMJE) “Proposes to require authors to share with others the deidentified individual patient data (IPD) underlying the results presented in the article.” Implementation delayed a year1 to allow new consent models 1. Post adoption Neil Walker 3
  • 4. … response to … Institute of Medicine (IoM) report (2015) Neil Walker 4
  • 5. … which includes … (p144, my emphasis): “De-identification is commonly used to protect the privacy of participants in a clinical trial (see also Appendix B). Various jurisdictions may differ on the degree to which the risk of re-identification must be reduced for the data to be considered sufficiently de-identified to justify more widespread sharing, particularly in the absence of specific informed consent of the data subjects.” [Appendix B is 54 pages on "Concepts and Methods for De-identifying Clinical Trial Data", by Khaled El Emam, and Bradley Malin] Neil Walker 5
  • 6. … and cites and relies on … Neil Walker 6
  • 7. … which is a poor implementation of … Neil Walker 7
  • 8. Why poor? ISTDB-2 has 19435 participants, and 112 variables, e.g.: Randomisation data; HOSPNUM;Hospital number RDELAY;Delay between stroke and randomisation in hours RCONSC;Conscious state at randomisation (F - fully alert, D - drowsy, U - unconscious) SEX;"M=male; F=female" AGE;Age in years RSLEEP;Symptoms noted on waking (Y/N) RATRIAL;"Atrial fibrillation (Y/N); not coded for pilot phase - 984 patients" ... COUNTRY;Abbreviated country code CNTRYNUM;Country code ... This should be enough people, right? Neil Walker 8
  • 9. Let’s count the people from each country $ cut -f82 IST_corrected.txt | sort | uniq -c | sort –nr 6257 UK 3437 ITAL 1631 SWIT 759 POLA ... 9 JAPA 2 FRAN 1 COUNTRY NB: dataset superseded by ISTDB-3, currently emabargoed due to "UK NHS Information Governance" Neil Walker 9
  • 10. Is this just isolated sloppiness? And noting a released dataset cannot be retrieved Neil Walker 10
  • 11. Examples - "Anecdata" (from Daniel C Barth-Jones, to whom many thanks) 1. Governor Weld - identified in insurance dataset in 1997 2. Netflix - customers identified in a dataset released to improve recommendations 3. Y-Chromosome STR surname inference - demonstration from Yaniv Erlich's lab 4. PGP - subjects identified in (Open) Personal Genome Project 5. Washington State Hospital Discharge data - patients identified in data sold by hospital 6. NYC Taxi - celebrities identified in FOIL request 7. Mobile phone - theoretical identification from mobile phone location data Neil Walker 11
  • 12. Failure modes? • 1. and 4. are cases where too much data was released (Zipcodes, DOBs) • 6. and 7. are breached by linking multiple records, individually OK (probably) - though 6. had a key hacked too But all rely on data available outside the dataset to make the (often small number of) identifications - some of it not obvious Neil Walker 12
  • 13. So, de-identification hotly debated… “There is no evidence that de-identification works either in theory or in practice and attempts to quantify its efficacy are unscientific and promote a false sense of security by assuming unrealistic, artificially constrained models of what an adversary might do.” Neil Walker 13
  • 14. How does this jeopardise data access? • And not just bad publicity, though that doesn’t help! Image from Fast Company Neil Walker 14
  • 15. Data access issue #1: where consent was not sought for data sharing Data is being redacted e.g. from https://clinicalstudydatarequest.com GSK’s exclusion criteria includes: Whether GSK consider it feasible to anonymise the data without compromising the privacy and confidentiality of research participants. For example, anonymisation of data from studies of rare diseases is more difficult to achieve and will be reviewed on a case-by-case basis. Neil Walker 15
  • 16. Data access issue #2: where there is no experience of sharing data with consent Neil Walker 16 Should have lots of choices
  • 17. Where is clinical data sharing now? Neil Walker 17 EBI and NIH like this …
  • 18. Where should it be? Neil Walker 18 AggregateConsented, anonymised
  • 19. Understanding Society1, at UK Data Archive Neil Walker 19 1. https://www.understandingsociety.ac.uk/documentation/getting-started Downloads, 2014 3 285 2510 Datasets 2 29 3 Time to decision 3 months 2 weeks 1 day Decision by DAC Staff, reporting to DAC Registration, delegated by DAC i.e. some people do it well
  • 20. Data access issue #3: no elegant way to respond to a new attack Neil Walker 20 This paper led to all genotype summary statistics being placed behind firewalls
  • 21. Data access issue #4: people take risks STOP PRESS - September 7th 2016 NHGRI give up on access control? https://www.genome.gov/director/ https://www.genome.gov/27566089/Workshop-on-Sharing-Aggregate-Genomic-Data “NHGRI should recommend that NIH reconsider the policy for maintaining all genomic summary statistics under controlled access, and develop a default public access model based on transparent policy considerations for most genomics studies.” Neil Walker 21
  • 22. The elephant in the room? Neil Walker 22 From Banksy’s Barely Legal show, LA, 2006
  • 23. Anonymous data is seen as a asset to buy and sell However not all subjects will agree to data sharing, with a recent (health-data-related) poll finding 17% “objected to private companies having access to health data under any circumstances.” (Ipsos MORI 2016) Neil Walker 23
  • 24. So, to repeat: this is not a matter of consent or anonymise Do both Neil Walker 24