David Eastman, KP CHR Southeast, Atlanta, GA Don Bachman, MS, KP CHR, Portland, OR Daniel Ng, BSE, MBA, KP DOR, Oakland, CA Wei Tao, MS, KP DOR, Oakland, CA
Topics Background Survey of death data sources The SOURCE variable Methods of weaving death data together The CONFIDENCE variable Inter-source agreement analysis at KPGA VDW death data QA program preliminary findings Sprinkled throughout
Background VDW death files contain: Dates of death Qualifiers - data source, confidence, date imputation flag Causes of death Typically VDW sites have access to multiple sources of death data How data are woven together varies considerably
Death Data Sources HMO Membership Clarity Patient table Common membership Hospital Discharges State Death Certificates Social Security Administration National Death Index Tumor Data Clarity “Death Notes”
HMO Death Data: Pros and Cons Pros No probabilistic matching; unlikely to be the wrong person Gold standard at some HMOs Cons No cause of death information No inactive (prior) member deaths; death after disenrollment will probably be missed At some HMOs, family/employer must notify HMO; less rigorously reported & dates may be inaccurate. At other sites, hospital, home health and hospice care are well integrated in the EMR and provide very reliable death dates. At some sites, this method is more prone to false negatives than Gov’t data
Gov’t Death Data: Pros and Cons Pros HMO enrollment status at time of death is irrelevant; death after disenrollment more likely to be captured if it is part of the matching algorithm Some gov’t sources contain cause of death information Cons Probabalistic matching on names/dates/SSN/etc.; wrong person may get matched. Some sites cannot match on SSN which makes the method less reliable. Some sites do the matching themselves, some only get matches from the gov’t May be more far reaching than HMO data, but may not include deaths outside of HMO’s state(s) At some sites, this method is more prone to false positives than HMO data
The SOURCE Variable Spec definition: Source of death data? Spec values: S = State Death files N = National Death Index T = Tumor data Others are locally defined Based on preliminary QA results from 7 sites: 5 sites use the State Death files (S) 1 site uses National Death Index data (N) 2 sites use the Tumor data (T) 7 sites include “other” local codes
Methods of Weaving Death DataTogether Descriptions of methods used at: KPGA KPNC KPNW
KPGA Method - Step 1Merge all possible death data into a research data warehouse table
KPGA Method – Step 2Select the “best quality” data to populate the VDWHMO sources favored (vs. Gov’t sources)Confidence variable: source agreement & postmortem activity
KPNC Method1. Input Pre-Processing Combine member records containing demographic variables, contact dates, and membership dates2. QualityStage matching Probabilistic matching of KPNC members to CA state and SSA death records3. Initial Filtering Filter large number of match output records down to manageable size Resulting files (KPNC-CA and KPNC-SSA matches) have multiple matches per MRN4. Ranking & Selection Select the single, best match per MRN based on weighted comparison of match linkweights, demographic vars, and contact and membership dates5. Assign Final Variables Select best Death date Assign scores for overall confidence and confidence of CA and SSA matches
KPNW Method - Part 1Internal KP data: only use reliable sources 1. Patient table from Clarity. Most reliable & best source of death dates based on internal validation and subsequent CESR QA. 2. Common Membership including a specific death table (older sources don’t include death dates, but do correctly identify dead patients) 3. KPNW tumor registry 4. Probabilistic match of KP members to OR and WA state data by CHR Staff (unlike other many other sites). OR & WA state don’t do the matching and won’t share SSNs. CHR staff match members from the past 2 years to the state data. Only current source of cause of death. 18-36 month lag.
KPNW Method - Part 2 Been creating death files for several years Death files only include those who we believe have truly died Death dates from KP internal data appear very reliable based on CESR QA Death dates from the Tumor Registry and state data are also excellent but not as good as internal KP data Death more than 2 years after disenrollment will probably be missed with current system Would benefit from switching to a common HMORN confidence variable algorithm
The CONFIDENCE Variable Spec definition: “How you rate the accuracy of the observation based on source, match, # of reporting sources, discrepancies, etc.” Spec values: E=Excellent, F=Fair, P=Poor Based on preliminary QA results from 7 sites, by site: % E ranges from 20% to 100% % F ranges from 0% to 55% % P ranges from 0% to 50% % E + %F ranges from 50% to 100% The CONFIDENCE variable is inconsistently implemented!
The CONFIDENCE Variable What does the confidence variable measure? Likelihood of death? Accuracy of the death date? Likelihood that the cause of death information is linked to the correct person?
Inter-source Agreement Analysis atKPGA Where do data come from? Corroborated deaths Inter-source death date agreement Postmortem activity Confidence distribution
Recommendations Create new confidence variables Confidence that the patient is really dead Confidence in the death date Confidence in the linkage to external source data KPNC has implemented these as local variables Develop a common algorithm to determine the values of these confidence variables to give them a common meaning.