SlideShare a Scribd company logo
1 of 14
Pseudonymised Matching:
Robustly Linking Molecular and
Prescription Data to Cancer
Registry Data in England
Brian Shand, Fiona McRonald, Katherine Henson, Cong Chen
(Public Health England)
Overview
• Motivation: matching patients between data feeds is
challenging
• The OpenPseudonymiser approach to pseudonymisation
with one-way hash functions
• Extending OpenPseudonymiser with encrypted
demographics
• Results: linkage of national prescription data, BRCA
mutation screening data
• Conclusion
2 Pseudonymised matching: robustly linking molecular and prescription data to cancer registry data in England
Motivation – information needs
• Cancer registry data is extremely sensitive, and challenging to link:
• The English cancer registration service (NCRAS) cannot reveal who has
cancer to external providers
• External providers cannot give identifiable data for patients without
cancer – NCRAS can however hold data on patients with (suspicion of)
cancer
• This makes sensitive feeds without a cancer marker difficult to access, e.g.
national prescription data, BRCA molecular screening data
• screening for mutations in BRCA1 or BRCA2 genes identifies people
with increased risk of developing breast and/or ovarian cancer.
• 50%-65% of women with a BRCA1 mutation develop breast cancer by
age 70, and 35%-46% ovarian cancer.
• if patients develop cancer later, the mutation data would add value
3 Pseudonymised matching: robustly linking molecular and prescription data to cancer registry data in England
Key idea
• We want to pseudonymise cancer registry
data and another data source in the same
way:
• If the same patient is in both data sources,
they will get the same pseudo-id.
• Demographics / sensitive fields can be
encrypted, so that only a trusted party –
who also knows the linkage demographics
– can decrypt them.
• Non-demographic fields are generally not
disclosive, and do not need to be
encrypted (at least within our secure
cancer database).
4 Pseudonymised matching: robustly linking molecular and prescription data to cancer registry data in England
Useful concepts
• Hashing
• Irreversible scrambling algorithm
• Secret salt
• Information making hashing context-specific
• Reversible encryption
5 Pseudonymised matching: robustly linking molecular and prescription data to cancer registry data in England
Illustrative slide
6 Pseudonymised matching: robustly linking molecular and prescription data to cancer registry data in England
Hashes and OpenPseudonymiser
• We start with the OpenPseudonymiser approach, which uses SHA-256 to
generate pseudonyms for each patient:
• SHA-256 is a one-way hash function (and cryptographically secure)
• given x, it’s straightforward to compute y = sha256(x), but
• given y, it’s impossible to reconstruct x, without trying all possibilities by
brute force.
• The pseudonyms are secure, if the salt is secret, and “long enough” (e.g.
256 bits of random data).
• Replace each patient identifier with a pseudonym, derived from the NHS
number (national healthcare identifier)
• researchers can link their datasets, without sharing patient demographics
• pseudonym = sha256(NHS number + salt)
E.g. sha256('1234567881’ + 'ab00ec62fa2ad275b08471cbfc76cb85
80f92283f3663baff0ea7d83aee57e19') = ' 778aebfe72aefcf391d00
96333bf325837981ba60ba8a5921be37789307321d3'
7 Pseudonymised matching: robustly linking molecular and prescription data to cancer registry data in England
OpenPseudonymiser
• Research teams use the same salt as a shared secret.
• Patients with the same NHS number will be given the same pseudonym
778aebf…d3 <=> 778aebf…d3
• Without knowing the salt, the pseudonyms are non-identifiable.
• Ordinary researchers cannot access to the salt: only a trusted linkage
function can use it, and the secrecy of this is contractually agreed.
(If the salt is known, a brute force attack could be possible.)
• Patients must match exactly by NHS number (or other demographics used
for matching purposes, e.g. postcode + date of birth)
• OpenPseudonymiser only protects the key demographics (NHS number);
the clinical data is treated as non-identifiable
• Patients must match exactly by NHS number (or whatever demographics
tuple is used for matching purposes). OpenPseudonymiser does not
support complex patient matching (e.g. NHS number + surname + month
and year of birth)
8 Pseudonymised matching: robustly linking molecular and prescription data to cancer registry data in England
This is the top half of the slide
9 Pseudonymised matching: robustly linking molecular and prescription data to cancer registry data in England
Extending OpenPseudonymiser
• We have extended OpenPseudonymiser-like pseudonymisation to support
fuzzy patient matching, and clinical data encryption.
• As in OpenPseudonymiser, pseudonyms identify possible matches, i.e.
records in which the registry has a legitimate interest.
• We use the plaintext linkage demographics to generate a secondary
encryption key, e.g.
• per-record encryption keys are used for additional demographics, and
clinical data
• keys combine patient pseudonym, random key, and additional salt
10 Pseudonymised matching: robustly linking molecular and prescription data to cancer registry data in England
Extending OpenPseudonymiser 2
• The cancer registry keeps an isolated database of pseudonymised data and
keys, to match registry patients against.
• Where the core demographics match, the remaining demographics will
be unpacked, and used for fuzzy patient matching.
• If the demographics match score is high enough, the clinical data will be
unpacked and released to the encore cancer registration database.
• No access to identifiable data for patients not suspected to have cancer
• The pseudonymised dataset itself can also be used for baseline
comparisons, e.g. to compare how often a particular prescription drug was
dispensed to lung cancer patients, vs the overall population.
• By including patient age as a derived, non-disclosive field in the
pseudonymised data, baseline comparisons can be age standardised.
11 Pseudonymised matching: robustly linking molecular and prescription data to cancer registry data in England
This is the full slide
12 Pseudonymised matching: robustly linking molecular and prescription data to cancer registry data in England
Applications in PHE
• Public Health England has access to pseudonymised national prescription
data feeds from NHS Business Services Authority, and BRCA and other
genetic mutation screening data. These have been linked to the cancer
registry. Decrypted birthdates help validate NHS number matches.
• Four months of prescription data (332 million prescriptions, 29 million
people) matched 1.6 million cancer patients: 88% of living cancer
patients had a prescription record.
• We now have 47 months of prescription data linked to the cancer registry
• Non-disclosive fields need not be pseudonymised, so the pseudonymised
dataset allows baseline comparisons against the cancer-linked cohort. For
BRCA screening data, this identified nearly 1,300 unique variants from 7,000
screening patients, and an overall variant detection rate of about 25%. In
prescription data, cancer patients were compared with age-matched
controls.
13 Pseudonymised matching: robustly linking molecular and prescription data to cancer registry data in England
Conclusion
• Linking data from external sources to the cancer registry creates a powerful
resource to better understand patient experience over their lifetimes.
• Pseudonymised matching can help to unlock data sources which include
people without cancer.
• We have done this for prescribing and screening data.
• cong.chen@phe.gov.uk
• ncrasenquiries@phe.gov.uk
14 Pseudonymised matching: robustly linking molecular and prescription data to cancer registry data in England

More Related Content

Similar to Pseudonymised Linkage of Cancer Registry Data in England

Reg Sci Lecture Dec 2016
Reg Sci Lecture Dec 2016Reg Sci Lecture Dec 2016
Reg Sci Lecture Dec 2016
Rick Silva
 
Cutting Edge Conversations: Addressing Orphan and Rare Diseases
Cutting Edge Conversations: Addressing Orphan and Rare DiseasesCutting Edge Conversations: Addressing Orphan and Rare Diseases
Cutting Edge Conversations: Addressing Orphan and Rare Diseases
InsideScientific
 

Similar to Pseudonymised Linkage of Cancer Registry Data in England (20)

IMS Health RWES: The Future of Real-World Insights in Cancer
IMS Health RWES: The Future of Real-World Insights in CancerIMS Health RWES: The Future of Real-World Insights in Cancer
IMS Health RWES: The Future of Real-World Insights in Cancer
 
Cancer Moonshot, Data sharing and the Genomic Data Commons
Cancer Moonshot, Data sharing and the Genomic Data CommonsCancer Moonshot, Data sharing and the Genomic Data Commons
Cancer Moonshot, Data sharing and the Genomic Data Commons
 
Personalized Medicine's Effect on Oncologists' Treatment Regimens
Personalized Medicine's Effect on Oncologists' Treatment RegimensPersonalized Medicine's Effect on Oncologists' Treatment Regimens
Personalized Medicine's Effect on Oncologists' Treatment Regimens
 
HSCIC Data Linkage Stakeholder Forum Nov 2013: The Data Linkage and Extract S...
HSCIC Data Linkage Stakeholder Forum Nov 2013: The Data Linkage and Extract S...HSCIC Data Linkage Stakeholder Forum Nov 2013: The Data Linkage and Extract S...
HSCIC Data Linkage Stakeholder Forum Nov 2013: The Data Linkage and Extract S...
 
factores de riesgo DM2
factores de riesgo DM2factores de riesgo DM2
factores de riesgo DM2
 
Power to the Patient
Power to the PatientPower to the Patient
Power to the Patient
 
16
1616
16
 
Big Data Analytics in the Health Domain
Big Data Analytics in the Health DomainBig Data Analytics in the Health Domain
Big Data Analytics in the Health Domain
 
National Cancer Data Ecosystem and Data Sharing
National Cancer Data Ecosystem and Data SharingNational Cancer Data Ecosystem and Data Sharing
National Cancer Data Ecosystem and Data Sharing
 
Day 2 Big Data panel at the NIH BD2K All Hands 2016 meeting
Day 2 Big Data panel at the NIH BD2K All Hands 2016 meetingDay 2 Big Data panel at the NIH BD2K All Hands 2016 meeting
Day 2 Big Data panel at the NIH BD2K All Hands 2016 meeting
 
Webinar: Increase research efficiency and enable collaboration with the IDBS ...
Webinar: Increase research efficiency and enable collaboration with the IDBS ...Webinar: Increase research efficiency and enable collaboration with the IDBS ...
Webinar: Increase research efficiency and enable collaboration with the IDBS ...
 
Annotation Editorial
Annotation EditorialAnnotation Editorial
Annotation Editorial
 
ICBO 2014, October 8, 2014
ICBO 2014, October 8, 2014ICBO 2014, October 8, 2014
ICBO 2014, October 8, 2014
 
NCI Cancer Genomic Data Commons for NCAB September 2016
NCI Cancer Genomic Data Commons for NCAB September 2016NCI Cancer Genomic Data Commons for NCAB September 2016
NCI Cancer Genomic Data Commons for NCAB September 2016
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehous...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehous...tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehous...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehous...
 
Big data for precision medicine: challenges and opportunities
Big data for precision medicine: challenges and opportunitiesBig data for precision medicine: challenges and opportunities
Big data for precision medicine: challenges and opportunities
 
Reg Sci Lecture Dec 2016
Reg Sci Lecture Dec 2016Reg Sci Lecture Dec 2016
Reg Sci Lecture Dec 2016
 
Pr. Peivand Pirouzi - Calcium channel blocker treatments and cancer risk 2015...
Pr. Peivand Pirouzi - Calcium channel blocker treatments and cancer risk 2015...Pr. Peivand Pirouzi - Calcium channel blocker treatments and cancer risk 2015...
Pr. Peivand Pirouzi - Calcium channel blocker treatments and cancer risk 2015...
 
Translational Genomics towards Personalized medicine - Medhavi Vashisth.ppt
Translational Genomics towards Personalized medicine - Medhavi Vashisth.pptTranslational Genomics towards Personalized medicine - Medhavi Vashisth.ppt
Translational Genomics towards Personalized medicine - Medhavi Vashisth.ppt
 
Cutting Edge Conversations: Addressing Orphan and Rare Diseases
Cutting Edge Conversations: Addressing Orphan and Rare DiseasesCutting Edge Conversations: Addressing Orphan and Rare Diseases
Cutting Edge Conversations: Addressing Orphan and Rare Diseases
 

Recently uploaded

Call Girls Service Chandigarh Sexy Video ❤️🍑 8511114078 👄🫦 Independent Escort...
Call Girls Service Chandigarh Sexy Video ❤️🍑 8511114078 👄🫦 Independent Escort...Call Girls Service Chandigarh Sexy Video ❤️🍑 8511114078 👄🫦 Independent Escort...
Call Girls Service Chandigarh Sexy Video ❤️🍑 8511114078 👄🫦 Independent Escort...
Sheetaleventcompany
 
surat Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
surat Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meetsurat Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
surat Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Call Girls Chandigarh
 
👉Bangalore Call Girl Service👉📞 7304373326 👉📞 Just📲 Call Rajveer Call Girls Se...
👉Bangalore Call Girl Service👉📞 7304373326 👉📞 Just📲 Call Rajveer Call Girls Se...👉Bangalore Call Girl Service👉📞 7304373326 👉📞 Just📲 Call Rajveer Call Girls Se...
👉Bangalore Call Girl Service👉📞 7304373326 👉📞 Just📲 Call Rajveer Call Girls Se...
Sheetaleventcompany
 
Indore Call Girl Service 📞9235973566📞Just Call Inaaya📲 Call Girls In Indore N...
Indore Call Girl Service 📞9235973566📞Just Call Inaaya📲 Call Girls In Indore N...Indore Call Girl Service 📞9235973566📞Just Call Inaaya📲 Call Girls In Indore N...
Indore Call Girl Service 📞9235973566📞Just Call Inaaya📲 Call Girls In Indore N...
Sheetaleventcompany
 
Call Girls in Udaipur Girija Udaipur Call Girl ✔ VQRWTO ❤️ 100% offer with...
Call Girls in Udaipur  Girija  Udaipur Call Girl  ✔ VQRWTO ❤️ 100% offer with...Call Girls in Udaipur  Girija  Udaipur Call Girl  ✔ VQRWTO ❤️ 100% offer with...
Call Girls in Udaipur Girija Udaipur Call Girl ✔ VQRWTO ❤️ 100% offer with...
mahaiklolahd
 
Call Girls In Indore 📞9235973566📞Just Call Inaaya📲 Call Girls Service In Indo...
Call Girls In Indore 📞9235973566📞Just Call Inaaya📲 Call Girls Service In Indo...Call Girls In Indore 📞9235973566📞Just Call Inaaya📲 Call Girls Service In Indo...
Call Girls In Indore 📞9235973566📞Just Call Inaaya📲 Call Girls Service In Indo...
Sheetaleventcompany
 
Call Girl in Indore 8827247818 {Low Price}👉 Meghna Indore Call Girls * DXZ...
Call Girl in Indore 8827247818 {Low Price}👉   Meghna Indore Call Girls  * DXZ...Call Girl in Indore 8827247818 {Low Price}👉   Meghna Indore Call Girls  * DXZ...
Call Girl in Indore 8827247818 {Low Price}👉 Meghna Indore Call Girls * DXZ...
mahaiklolahd
 
💚Chandigarh Call Girls Service 💯Jiya 📲🔝8868886958🔝Call Girls In Chandigarh No...
💚Chandigarh Call Girls Service 💯Jiya 📲🔝8868886958🔝Call Girls In Chandigarh No...💚Chandigarh Call Girls Service 💯Jiya 📲🔝8868886958🔝Call Girls In Chandigarh No...
💚Chandigarh Call Girls Service 💯Jiya 📲🔝8868886958🔝Call Girls In Chandigarh No...
Sheetaleventcompany
 
9316020077📞Majorda Beach Call Girls Numbers, Call Girls Whatsapp Numbers Ma...
9316020077📞Majorda Beach Call Girls  Numbers, Call Girls  Whatsapp Numbers Ma...9316020077📞Majorda Beach Call Girls  Numbers, Call Girls  Whatsapp Numbers Ma...
9316020077📞Majorda Beach Call Girls Numbers, Call Girls Whatsapp Numbers Ma...
Goa cutee sexy top girl
 
Top 20 Famous Indian Female Pornstars Name List 2024
Top 20 Famous Indian Female Pornstars Name List 2024Top 20 Famous Indian Female Pornstars Name List 2024
Top 20 Famous Indian Female Pornstars Name List 2024
Sheetaleventcompany
 

Recently uploaded (20)

Call Girls Service Chandigarh Sexy Video ❤️🍑 8511114078 👄🫦 Independent Escort...
Call Girls Service Chandigarh Sexy Video ❤️🍑 8511114078 👄🫦 Independent Escort...Call Girls Service Chandigarh Sexy Video ❤️🍑 8511114078 👄🫦 Independent Escort...
Call Girls Service Chandigarh Sexy Video ❤️🍑 8511114078 👄🫦 Independent Escort...
 
Gorgeous Call Girls Mohali {7435815124} ❤️VVIP ANGEL Call Girls in Mohali Punjab
Gorgeous Call Girls Mohali {7435815124} ❤️VVIP ANGEL Call Girls in Mohali PunjabGorgeous Call Girls Mohali {7435815124} ❤️VVIP ANGEL Call Girls in Mohali Punjab
Gorgeous Call Girls Mohali {7435815124} ❤️VVIP ANGEL Call Girls in Mohali Punjab
 
surat Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
surat Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meetsurat Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
surat Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
Jaipur Call Girls 9257276172 Call Girl in Jaipur Rajasthan
Jaipur Call Girls 9257276172 Call Girl in Jaipur RajasthanJaipur Call Girls 9257276172 Call Girl in Jaipur Rajasthan
Jaipur Call Girls 9257276172 Call Girl in Jaipur Rajasthan
 
Sexy Call Girl Kumbakonam Arshi 💚9058824046💚 Kumbakonam Escort Service
Sexy Call Girl Kumbakonam Arshi 💚9058824046💚 Kumbakonam Escort ServiceSexy Call Girl Kumbakonam Arshi 💚9058824046💚 Kumbakonam Escort Service
Sexy Call Girl Kumbakonam Arshi 💚9058824046💚 Kumbakonam Escort Service
 
Sexy Call Girl Dharmapuri Arshi 💚9058824046💚 Dharmapuri Escort Service
Sexy Call Girl Dharmapuri Arshi 💚9058824046💚 Dharmapuri Escort ServiceSexy Call Girl Dharmapuri Arshi 💚9058824046💚 Dharmapuri Escort Service
Sexy Call Girl Dharmapuri Arshi 💚9058824046💚 Dharmapuri Escort Service
 
Sexy Call Girl Nagercoil Arshi 💚9058824046💚 Nagercoil Escort Service
Sexy Call Girl Nagercoil Arshi 💚9058824046💚 Nagercoil Escort ServiceSexy Call Girl Nagercoil Arshi 💚9058824046💚 Nagercoil Escort Service
Sexy Call Girl Nagercoil Arshi 💚9058824046💚 Nagercoil Escort Service
 
👉Bangalore Call Girl Service👉📞 7304373326 👉📞 Just📲 Call Rajveer Call Girls Se...
👉Bangalore Call Girl Service👉📞 7304373326 👉📞 Just📲 Call Rajveer Call Girls Se...👉Bangalore Call Girl Service👉📞 7304373326 👉📞 Just📲 Call Rajveer Call Girls Se...
👉Bangalore Call Girl Service👉📞 7304373326 👉📞 Just📲 Call Rajveer Call Girls Se...
 
2024 PCP #IMPerative Updates in Rheumatology
2024 PCP #IMPerative Updates in Rheumatology2024 PCP #IMPerative Updates in Rheumatology
2024 PCP #IMPerative Updates in Rheumatology
 
Indore Call Girl Service 📞9235973566📞Just Call Inaaya📲 Call Girls In Indore N...
Indore Call Girl Service 📞9235973566📞Just Call Inaaya📲 Call Girls In Indore N...Indore Call Girl Service 📞9235973566📞Just Call Inaaya📲 Call Girls In Indore N...
Indore Call Girl Service 📞9235973566📞Just Call Inaaya📲 Call Girls In Indore N...
 
Call Girls in Udaipur Girija Udaipur Call Girl ✔ VQRWTO ❤️ 100% offer with...
Call Girls in Udaipur  Girija  Udaipur Call Girl  ✔ VQRWTO ❤️ 100% offer with...Call Girls in Udaipur  Girija  Udaipur Call Girl  ✔ VQRWTO ❤️ 100% offer with...
Call Girls in Udaipur Girija Udaipur Call Girl ✔ VQRWTO ❤️ 100% offer with...
 
Sexy Call Girl Palani Arshi 💚9058824046💚 Palani Escort Service
Sexy Call Girl Palani Arshi 💚9058824046💚 Palani Escort ServiceSexy Call Girl Palani Arshi 💚9058824046💚 Palani Escort Service
Sexy Call Girl Palani Arshi 💚9058824046💚 Palani Escort Service
 
AECS Layout Escorts (Bangalore) 9352852248 Women seeking Men Real Service
AECS Layout Escorts (Bangalore) 9352852248 Women seeking Men Real ServiceAECS Layout Escorts (Bangalore) 9352852248 Women seeking Men Real Service
AECS Layout Escorts (Bangalore) 9352852248 Women seeking Men Real Service
 
Call Girls In Indore 📞9235973566📞Just Call Inaaya📲 Call Girls Service In Indo...
Call Girls In Indore 📞9235973566📞Just Call Inaaya📲 Call Girls Service In Indo...Call Girls In Indore 📞9235973566📞Just Call Inaaya📲 Call Girls Service In Indo...
Call Girls In Indore 📞9235973566📞Just Call Inaaya📲 Call Girls Service In Indo...
 
(Big Boobs Indian Girls) 💓 9257276172 💓High Profile Call Girls Jaipur You Can...
(Big Boobs Indian Girls) 💓 9257276172 💓High Profile Call Girls Jaipur You Can...(Big Boobs Indian Girls) 💓 9257276172 💓High Profile Call Girls Jaipur You Can...
(Big Boobs Indian Girls) 💓 9257276172 💓High Profile Call Girls Jaipur You Can...
 
Call Girl in Indore 8827247818 {Low Price}👉 Meghna Indore Call Girls * DXZ...
Call Girl in Indore 8827247818 {Low Price}👉   Meghna Indore Call Girls  * DXZ...Call Girl in Indore 8827247818 {Low Price}👉   Meghna Indore Call Girls  * DXZ...
Call Girl in Indore 8827247818 {Low Price}👉 Meghna Indore Call Girls * DXZ...
 
💚Chandigarh Call Girls Service 💯Jiya 📲🔝8868886958🔝Call Girls In Chandigarh No...
💚Chandigarh Call Girls Service 💯Jiya 📲🔝8868886958🔝Call Girls In Chandigarh No...💚Chandigarh Call Girls Service 💯Jiya 📲🔝8868886958🔝Call Girls In Chandigarh No...
💚Chandigarh Call Girls Service 💯Jiya 📲🔝8868886958🔝Call Girls In Chandigarh No...
 
Rishikesh Call Girls Service 6398383382 Real Russian Girls Looking Models
Rishikesh Call Girls Service 6398383382 Real Russian Girls Looking ModelsRishikesh Call Girls Service 6398383382 Real Russian Girls Looking Models
Rishikesh Call Girls Service 6398383382 Real Russian Girls Looking Models
 
9316020077📞Majorda Beach Call Girls Numbers, Call Girls Whatsapp Numbers Ma...
9316020077📞Majorda Beach Call Girls  Numbers, Call Girls  Whatsapp Numbers Ma...9316020077📞Majorda Beach Call Girls  Numbers, Call Girls  Whatsapp Numbers Ma...
9316020077📞Majorda Beach Call Girls Numbers, Call Girls Whatsapp Numbers Ma...
 
Top 20 Famous Indian Female Pornstars Name List 2024
Top 20 Famous Indian Female Pornstars Name List 2024Top 20 Famous Indian Female Pornstars Name List 2024
Top 20 Famous Indian Female Pornstars Name List 2024
 

Pseudonymised Linkage of Cancer Registry Data in England

  • 1. Pseudonymised Matching: Robustly Linking Molecular and Prescription Data to Cancer Registry Data in England Brian Shand, Fiona McRonald, Katherine Henson, Cong Chen (Public Health England)
  • 2. Overview • Motivation: matching patients between data feeds is challenging • The OpenPseudonymiser approach to pseudonymisation with one-way hash functions • Extending OpenPseudonymiser with encrypted demographics • Results: linkage of national prescription data, BRCA mutation screening data • Conclusion 2 Pseudonymised matching: robustly linking molecular and prescription data to cancer registry data in England
  • 3. Motivation – information needs • Cancer registry data is extremely sensitive, and challenging to link: • The English cancer registration service (NCRAS) cannot reveal who has cancer to external providers • External providers cannot give identifiable data for patients without cancer – NCRAS can however hold data on patients with (suspicion of) cancer • This makes sensitive feeds without a cancer marker difficult to access, e.g. national prescription data, BRCA molecular screening data • screening for mutations in BRCA1 or BRCA2 genes identifies people with increased risk of developing breast and/or ovarian cancer. • 50%-65% of women with a BRCA1 mutation develop breast cancer by age 70, and 35%-46% ovarian cancer. • if patients develop cancer later, the mutation data would add value 3 Pseudonymised matching: robustly linking molecular and prescription data to cancer registry data in England
  • 4. Key idea • We want to pseudonymise cancer registry data and another data source in the same way: • If the same patient is in both data sources, they will get the same pseudo-id. • Demographics / sensitive fields can be encrypted, so that only a trusted party – who also knows the linkage demographics – can decrypt them. • Non-demographic fields are generally not disclosive, and do not need to be encrypted (at least within our secure cancer database). 4 Pseudonymised matching: robustly linking molecular and prescription data to cancer registry data in England
  • 5. Useful concepts • Hashing • Irreversible scrambling algorithm • Secret salt • Information making hashing context-specific • Reversible encryption 5 Pseudonymised matching: robustly linking molecular and prescription data to cancer registry data in England
  • 6. Illustrative slide 6 Pseudonymised matching: robustly linking molecular and prescription data to cancer registry data in England
  • 7. Hashes and OpenPseudonymiser • We start with the OpenPseudonymiser approach, which uses SHA-256 to generate pseudonyms for each patient: • SHA-256 is a one-way hash function (and cryptographically secure) • given x, it’s straightforward to compute y = sha256(x), but • given y, it’s impossible to reconstruct x, without trying all possibilities by brute force. • The pseudonyms are secure, if the salt is secret, and “long enough” (e.g. 256 bits of random data). • Replace each patient identifier with a pseudonym, derived from the NHS number (national healthcare identifier) • researchers can link their datasets, without sharing patient demographics • pseudonym = sha256(NHS number + salt) E.g. sha256('1234567881’ + 'ab00ec62fa2ad275b08471cbfc76cb85 80f92283f3663baff0ea7d83aee57e19') = ' 778aebfe72aefcf391d00 96333bf325837981ba60ba8a5921be37789307321d3' 7 Pseudonymised matching: robustly linking molecular and prescription data to cancer registry data in England
  • 8. OpenPseudonymiser • Research teams use the same salt as a shared secret. • Patients with the same NHS number will be given the same pseudonym 778aebf…d3 <=> 778aebf…d3 • Without knowing the salt, the pseudonyms are non-identifiable. • Ordinary researchers cannot access to the salt: only a trusted linkage function can use it, and the secrecy of this is contractually agreed. (If the salt is known, a brute force attack could be possible.) • Patients must match exactly by NHS number (or other demographics used for matching purposes, e.g. postcode + date of birth) • OpenPseudonymiser only protects the key demographics (NHS number); the clinical data is treated as non-identifiable • Patients must match exactly by NHS number (or whatever demographics tuple is used for matching purposes). OpenPseudonymiser does not support complex patient matching (e.g. NHS number + surname + month and year of birth) 8 Pseudonymised matching: robustly linking molecular and prescription data to cancer registry data in England
  • 9. This is the top half of the slide 9 Pseudonymised matching: robustly linking molecular and prescription data to cancer registry data in England
  • 10. Extending OpenPseudonymiser • We have extended OpenPseudonymiser-like pseudonymisation to support fuzzy patient matching, and clinical data encryption. • As in OpenPseudonymiser, pseudonyms identify possible matches, i.e. records in which the registry has a legitimate interest. • We use the plaintext linkage demographics to generate a secondary encryption key, e.g. • per-record encryption keys are used for additional demographics, and clinical data • keys combine patient pseudonym, random key, and additional salt 10 Pseudonymised matching: robustly linking molecular and prescription data to cancer registry data in England
  • 11. Extending OpenPseudonymiser 2 • The cancer registry keeps an isolated database of pseudonymised data and keys, to match registry patients against. • Where the core demographics match, the remaining demographics will be unpacked, and used for fuzzy patient matching. • If the demographics match score is high enough, the clinical data will be unpacked and released to the encore cancer registration database. • No access to identifiable data for patients not suspected to have cancer • The pseudonymised dataset itself can also be used for baseline comparisons, e.g. to compare how often a particular prescription drug was dispensed to lung cancer patients, vs the overall population. • By including patient age as a derived, non-disclosive field in the pseudonymised data, baseline comparisons can be age standardised. 11 Pseudonymised matching: robustly linking molecular and prescription data to cancer registry data in England
  • 12. This is the full slide 12 Pseudonymised matching: robustly linking molecular and prescription data to cancer registry data in England
  • 13. Applications in PHE • Public Health England has access to pseudonymised national prescription data feeds from NHS Business Services Authority, and BRCA and other genetic mutation screening data. These have been linked to the cancer registry. Decrypted birthdates help validate NHS number matches. • Four months of prescription data (332 million prescriptions, 29 million people) matched 1.6 million cancer patients: 88% of living cancer patients had a prescription record. • We now have 47 months of prescription data linked to the cancer registry • Non-disclosive fields need not be pseudonymised, so the pseudonymised dataset allows baseline comparisons against the cancer-linked cohort. For BRCA screening data, this identified nearly 1,300 unique variants from 7,000 screening patients, and an overall variant detection rate of about 25%. In prescription data, cancer patients were compared with age-matched controls. 13 Pseudonymised matching: robustly linking molecular and prescription data to cancer registry data in England
  • 14. Conclusion • Linking data from external sources to the cancer registry creates a powerful resource to better understand patient experience over their lifetimes. • Pseudonymised matching can help to unlock data sources which include people without cancer. • We have done this for prescribing and screening data. • cong.chen@phe.gov.uk • ncrasenquiries@phe.gov.uk 14 Pseudonymised matching: robustly linking molecular and prescription data to cancer registry data in England