Application of Probabilistic Linkage Methods_Join Infectious Disease Surveillance Records-Death Registrations_PVERConf_May2011
Upcoming SlideShare
Loading in...5
×
 

Application of Probabilistic Linkage Methods_Join Infectious Disease Surveillance Records-Death Registrations_PVERConf_May2011

on

  • 680 views

PowerPoint Presentation from May 2011 Personal Validation and Entity Resolution Conference. Presenters: T. Lamagni, N. Potz, D. Powell, N. Hinton, A. Grant, E. Sheridan, R. Pebody. Presentation ...

PowerPoint Presentation from May 2011 Personal Validation and Entity Resolution Conference. Presenters: T. Lamagni, N. Potz, D. Powell, N. Hinton, A. Grant, E. Sheridan, R. Pebody. Presentation Title: Application of probabilistic linkage methods to join infectious disease surveillance records to death registrations

Statistics

Views

Total Views
680
Views on SlideShare
680
Embed Views
0

Actions

Likes
0
Downloads
7
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Stages of matching Pre-match preparation = formatting, blocking, weighting etc. Explanation of graph: Distribution of total weights of record pairs is roughly bimodal. i.e. two overlapping populations. Where they overlap is a grey area that requires manual checking. Above = good matches, below = non-matches. Gives us matched and unmatched pairs, and a set of records for manual checking.

Application of Probabilistic Linkage Methods_Join Infectious Disease Surveillance Records-Death Registrations_PVERConf_May2011 Application of Probabilistic Linkage Methods_Join Infectious Disease Surveillance Records-Death Registrations_PVERConf_May2011 Presentation Transcript

  • Application of probabilistic linkage methods to join infectious disease surveillance records to death registrations T Lamagni, N Potz, D Powell, N Hinton, A Grant, E Sheridan, R Pebody Healthcare-Associated Infection & Antimicrobial Resistance Department
  • overview
    • data sharing between organisations
    • use of probabilistic linkage methods for study on infectious disease deaths
    • further uses of probabilistic linkage
    • summary and conclusions
  • data sharing between public bodies
    • multitude of potential benefits to sharing of data between agencies including:
    • accessing new information
    • reducing demands on suppliers of data
    • Data Protection Act 1998 (UK) allows data sharing, depending on owner / recipient legal status
    • Department for Constitutional Affairs ‘Information sharing vision statement’, 2006:
      • “ Government is committed to more information sharing between public sector organisations and service providers”
  • challenges of data sharing
    • ethical concerns
    • disclosure of personal information between organisations raises concerns over potential erosion of rights to privacy
    • in UK, data sharing is regulated through Information Commissioner’s Office
    • technical barriers
    • size of datasets often very large
    • datasets may lack common unique identifier
    • Collaborative project between Health Protection Agency and Office for National Statistics (2005-07).
    research study on mortality associated with MRSA infection
    • aims of linkage study
    • estimate case fatality following meticillin-resistant Staphylococcus aureus (MRSA) bacteraemia
    • undertake analysis of death certification practice
    • provide sampling frame for confidential investigation
    • objectives
    • develop mechanism to match death registrations to MRSA records
    • carry out an independent evaluation of method
    • use linked data in fulfilment of aims given above
  • matching death registrations to infection records 2004-05
    • Method needed to link datasets taking into account:
            • lack of unique identifier in majority of infection records
            • errors in patient identifiers
            • size of datasets (MRSA: n=10,305; death registrations: n=1,153,221)
    • Variables available for matching:
    variable coding format completion of variable (%) infection records death registrations NHS number 10-digit (validity checked) 29.6 99.9 Forename initial Single letter (A–Z) 96.8 100 Surname Soundex Letter + 3 digits 97.7 100 Sex 1 (male), 2 (female) 97.9 100 Date of birth DD/MM/YYYY 99.0 100 Postcode Letter prefix only 51.4 99.8
  • probabilistic matching method
      • Testing of matching undertaken using invasive Streptococcus pneumoniae infection (n=1252) to allow independent evaluation using NHS Central Register Tracing (patient surname needed for tracing).
    Method developed to link large volumes of data that contain errors and omissions using the cumulative value of information available.
    • Matching steps:
      • Acquisition of infection and mortality data
      • Pre-match preparation including blocking (to reduce computational demand) & weighting of matching variables
      • Match records (SQL server) and calculate total weights for each record pair
      • Build linked file
    Total weight of record pair  good matches query matches non- matches
  • blocking and weighting variables Block A1 A2 1941 1942 1941 1942 … … … … Match 1941 1941 Weight blocked by SOUNDEX* blocked by year of birth blocked by SOUNDEX* blocked by year of birth Weight of matched SOUNDEX* Weight of matched year of birth A1 A1 A1 A2 weights are based on the likelihood of each value representing a true match matching variables (e.g. patient identifiers) compared within each matched pair of records * code based on surname Infection data Mortality data Format A112 A112 +17.2 A112 A420 -8.0 1941 1941 +6.8 + … …
  • post-matching stages merge and de-duplicate set threshold for auto accept/reject manually check pairs in ‘grey zone’ final matched dataset matched record pairs from SOUNDEX blocking matched record pairs from year of birth blocking
  • evaluation of probabilistic matching vs NHS Central Register Tracing Potz N et al. Probabilistic record linkage of infection records and death registrations: a tool to strengthen surveillance. Stat Commun Infect Dis 2010; 2(1):article 6. manual checking zone
  • probability of true match according to distribution of total weight scores Potz N et al. Probabilistic record linkage of infection records and death registrations: a tool to strengthen surveillance. Stat Commun Infect Dis 2010; 2(1):article 6.
  • evaluation of probabilistic matching vs NHS Central Register Tracing +ve predictive value 97.7% (465/476) to 99.8% (465/466) -ve predictive value 90.2% (692/767) to 97.9% (692/707) Potz N et al. Probabilistic record linkage of infection records and death registrations: a tool to strengthen surveillance. Stat Commun Infect Dis 2010; 2(1):article 6. NHS CR Tracing Traced ­ Dead Traced ­ Not dead Not traced Probabilistic record linkage Matched to a death record 465 1 10 476 Not matched to a death record 15 692 60 767 480 693 70 1243
  • interval between diagnosis of MRSA bacteraemia and death England 2004-5 30 day case fatality rate = 38% 7 day case fatality rate = 20% Lamagni TL, et al. Mortality in patients with MRSA bacteraemia, England 2004-05. J Hosp Infect 2011;77:16-20.
  • Kaplan-Meier time to death following invasive S. pyogenes infection England & Wales 2003-04 Lamagni TL et al. Predictors of death after severe Streptococcus pyogenes infection. Emerg Infect Dis 2009;15(8):1304-7.
  • further application of probabilistic linkage
    • De-duplication of routine surveillance data new probabilistic matching system implemented in July 2009
    • Linkage to other health datasets surveillance data linkage to external health datasets to augment routine monitoring/ provide platform for research (Hospital Episode Statistics, clinical patient networks, primary care surveillance)
    • e.g. project linking patients on UK Renal Registry (all patients undergoing renal dialysis) to bacteraemia surveillance data to identify risk factors and impact on mortality
  • summary & conclusions
    • probabilistic linkage offers a viable technique to link ‘difficult’ datasets
    • method can me amended depending on intended use e.g. use of single threshold to accept/reject matches where absolute certainty of match not needed
    • data sharing between health sector organisations is providing unique opportunities for public health research (powerful studies at relatively low cost + pursuit of novel research questions through access to new information)
    • ensuring public trust and confidence in security of data and demonstrating public benefit essential
  • acknowledgements
    • Study Team Nicki Potz, Senior Scientist; David Powell, Database Manager; David Bridger, Research Nurse
    • Additional members of the Project Board Andrew Chronias, HPA; Clare Griffiths, Office for National Statistics (ONS); Nourieh Hoveyda, ONS; Cleo Rooney, ONS; Levin Wheller, ONS; Jennie Wilson, HPA; Richard Pebody, ONS/HPA
    • Steering Group Georgia Duckworth, HPA; Joy Dobbs, ONS; Peter Goldblatt, ONS; Andrew Phillips, University College London; Sarah Scobie, National Patient Safety Agency; Robert Spencer, Hospital Infection Society.
    • Funders Department of Health for England
    • Enhanced S. pneumonia surveillance provided courtesy of HPA Respiratory and Systemic Infection Laboratory HPA Immunisation Department
    • We thank our microbiology colleagues in laboratories across the UK for their continued reporting of infectious diseases.