SlideShare a Scribd company logo
1 of 1
Download to read offline
1 S 1 1 S S 2 f 1
Turning Junk Data into Value
Using 9-digit Mixed Identifiers to Enhance Linkage Results
for Utah Controlled Substance Database
Yukiko Yoneoka1, MS, Wu Xu1, PhD , Zhiwei Liu1, MS, Brian Sauer2, PhD, Robert Rolfs1, MD
1Utah Department of Health and 2University of Utah, Salt Lake City, Utah
Methods
Data
Background
Public Health Problem
How Using 9-digit Mixed ID Enhanced Linkage Results
Result
CSDB 2006-2007 contained 9,342,994
prescription records. Of those, 6,212
records (0.07%) from veterinarian or other
non medical prescribers were excluded
from analysis. A total of 9,336,782
prescription records were used.
The 9-digit Mixed ID
Public Health Problem
In the past seven years, the state of Utah
has experienced increasing numbers of
unintentional deaths due to prescription
pain medication overdoses.
Controlled Substance Database (CSDB)
Pharmacies submit controlled substance
Certainty of Match Found by The Link King
465237
369940
edPairs
Method of Linking Used by The Link King
434329
369088
hedPairs
The 9-digit mixed ID was populated with
customer ID content, using:
1) Length of 9 after stripping off attached
characters (e.g., SSN, SS#, UTDL, UDL);
2) Drivers license number validation by an
algorithm that assigns Utah 9-digit drivers
license numbers (9, 8, 7, 6, 5, 4, 3, 2, 1) •
(d1 d2 d3 d4 d5 d6 d7 d8 d9) ≡0mod10;
dispensing records with patient information
to the CSDB without strict data standards.
As a result, considerable variability is
found in required patient information fields.
Challenge: Customer ID
Since standardization is not enforced for
the customer ID field it contains different
37013056 676
107309
Level 1: Highest possible Level 2: Very High Level 3: High
L l f C t i t
NumberofMatch
97.2% 77.4%
2.7%
22.5%
<1 % <1 %
108756
34565
976981
Both Det. & Prob. Probablintic Only Deterministic Only
M th d f Li ki
NumberofMatch
<1%
90.7% 77.2%
7.2%
2%
22.8%
(d1, d2, d3, d4, d5, d6, d7, d8, d9) ≡0mod10;
and, 3) Valid range of first 3 digits of SSN.
About 35.5% (3,313,731) of all prescription
records carried some type of 9-digit ID. The
distribution of content of the 9-digit mixed
ID is indicated in the table below.
the customer ID field, it contains different
types of IDs such as pharmacy specific ID,
SSN, drivers license number, passport
number, phone number, names and other
text. This inconsistency makes it
challenging for researchers to accurately
construct patient-centered prescription
records across pharmacy records.
About 20% more matched pairs were found with
highest possible level of certainty by using 9-digit
mixed IDs.
About 14% more matches were found by both
deterministic and probabilistic linking methods and
7% more by probabilistic method by using 9-digit
mixed IDs. Without, it heavily depended on
deterministic method only Break down of 9-digit mixed ID Number (%)
Level of Certainty
With 9-digit Mixed ID Without 9-digit Mixed ID
Method of Linking
With 9-digit Mixed ID Without 9-digit Mixed ID
Linkage
The Link King© v.7, a free SAS based
linkage software was
p y
Objective
The purpose of this study was to salvage
and utilize all 9-digit ID numbers from the
ID variable. We then examined how
adding the 9-digit mixed ID as a linkage
variable affected linkage results.
Major blocking criteria
where match was found
by The Link King
With 9-digit Mixed ID Without 9-digit Mixed ID
9-digit Mixed ID only 60,031 (12.5%) N/A
Last Name & DOB 407,105 (85%) 464,732 (97.2%)
deterministic method only. g ( )
SSN 1,108,388 (33.4)
UTDL 1,070,071 (32.3)
Other 9-digit ID 1,135,272 (34.3)
Total 3,313,731(100.0)
Acknowledgements
This study is supported by CDC Grant No.
P01 CD000284-03, P.I. Matthew Samore,
Utah Research Center for Excellence in
linkage software, was
used on a platform of
SAS v.9.1.3 . Variables
used for linkage
were: First, middle and
last names; date of
birth (DOB); gender;
zip code; and, 9-digit
First, Middle and Last Names only 7,879 (1.7%) 9,613 (2%)
First Name & DOB 1,941 (0.4%) 1,934 (0.4%)
First and Last Names & Birth Year 1,028 (0.2%) 990 (0.2%)
First and Last Names & Birth Month 575 (0.1%) 565 (0.1%)
In blocking process, about 13% of all matches were found by 9-digit mixed ID match alone.
Public Health Informatics. Many thanks to
Nancy McConnell for her valuable
suggestions.
Contact Yukiko Yoneoka
y.yoneoka@utah.gov
2009 AMIA Spring Congress, Orlando FL (May 28 - May30, 2009)
mixed ID. The data was linked
first with the 9-digit mixed ID then without.
The results were compared based on The
Link King’s result reports.
Conclusion
Retrieving 9-digit IDs from a mix of data collected in a customer ID field to create a mixed ID field as a linkage
variable would be a worthwhile practice, considering the enhanced quality of the linkage results.

More Related Content

What's hot

Multiscale integrative data analytics in pharmacogenomics
Multiscale integrative data analytics in pharmacogenomicsMultiscale integrative data analytics in pharmacogenomics
Multiscale integrative data analytics in pharmacogenomicsDr. Gerry Higgins
 
Very brief overview of AI in drug discovery
Very brief overview of AI in drug discoveryVery brief overview of AI in drug discovery
Very brief overview of AI in drug discoveryDr. Gerry Higgins
 
Epstein Mmws
Epstein MmwsEpstein Mmws
Epstein Mmwsenforme
 
Ketamine as an antidepressant
Ketamine as an antidepressantKetamine as an antidepressant
Ketamine as an antidepressantDr. Gerry Higgins
 
A User’s Perspective: ACMG Guidelines for CNVs in VSClinical
A User’s Perspective: ACMG Guidelines for CNVs in VSClinicalA User’s Perspective: ACMG Guidelines for CNVs in VSClinical
A User’s Perspective: ACMG Guidelines for CNVs in VSClinicalGolden Helix
 
PSB 2018 presentation
PSB 2018 presentationPSB 2018 presentation
PSB 2018 presentationQike (Max) Li
 
Multi scale, integrative data analytics in pharmacogenomics
Multi scale, integrative data analytics in pharmacogenomics Multi scale, integrative data analytics in pharmacogenomics
Multi scale, integrative data analytics in pharmacogenomics Dr. Gerry Higgins
 
Pre-clinical drug prioritization via prognosis-guided genetic interaction net...
Pre-clinical drug prioritization via prognosis-guided genetic interaction net...Pre-clinical drug prioritization via prognosis-guided genetic interaction net...
Pre-clinical drug prioritization via prognosis-guided genetic interaction net...laserxiong
 
MEDICARE HEALTHCARE CHARGE DISPARITY ANALYSIS
MEDICARE HEALTHCARE CHARGE DISPARITY ANALYSISMEDICARE HEALTHCARE CHARGE DISPARITY ANALYSIS
MEDICARE HEALTHCARE CHARGE DISPARITY ANALYSIShiij
 
GlaxoSmithKline to plead guilty, pay $3B for illicit promotion of drugs in re...
GlaxoSmithKline to plead guilty, pay $3B for illicit promotion of drugs in re...GlaxoSmithKline to plead guilty, pay $3B for illicit promotion of drugs in re...
GlaxoSmithKline to plead guilty, pay $3B for illicit promotion of drugs in re...dayjksbieyarq
 
BMI Research in Progress - Thursday talk
BMI Research in Progress - Thursday talkBMI Research in Progress - Thursday talk
BMI Research in Progress - Thursday talkMaulik Kamdar
 
Nick Tatonetti's presentation on Systems Pharmacology at AMIA 2015
Nick Tatonetti's presentation on Systems Pharmacology at AMIA 2015Nick Tatonetti's presentation on Systems Pharmacology at AMIA 2015
Nick Tatonetti's presentation on Systems Pharmacology at AMIA 2015Nicholas Tatonetti
 
2013-04-17: The Promise, Current State, And Future of Personalized Medicine
2013-04-17: The Promise, Current State, And Future of Personalized Medicine2013-04-17: The Promise, Current State, And Future of Personalized Medicine
2013-04-17: The Promise, Current State, And Future of Personalized MedicineBaltimore Lean Startup
 
Next-Generation Sequencing Analysis in VSClinical
Next-Generation Sequencing Analysis in VSClinicalNext-Generation Sequencing Analysis in VSClinical
Next-Generation Sequencing Analysis in VSClinicalGolden Helix
 

What's hot (20)

Multiscale integrative data analytics in pharmacogenomics
Multiscale integrative data analytics in pharmacogenomicsMultiscale integrative data analytics in pharmacogenomics
Multiscale integrative data analytics in pharmacogenomics
 
Very brief overview of AI in drug discovery
Very brief overview of AI in drug discoveryVery brief overview of AI in drug discovery
Very brief overview of AI in drug discovery
 
Epstein Mmws
Epstein MmwsEpstein Mmws
Epstein Mmws
 
Ketamine as an antidepressant
Ketamine as an antidepressantKetamine as an antidepressant
Ketamine as an antidepressant
 
A User’s Perspective: ACMG Guidelines for CNVs in VSClinical
A User’s Perspective: ACMG Guidelines for CNVs in VSClinicalA User’s Perspective: ACMG Guidelines for CNVs in VSClinical
A User’s Perspective: ACMG Guidelines for CNVs in VSClinical
 
PSB 2018 presentation
PSB 2018 presentationPSB 2018 presentation
PSB 2018 presentation
 
Medicare Part D Take Up and Changes in Out-of-Pocket Prescription 4.28.08
Medicare Part D Take Up and Changes in Out-of-Pocket Prescription 4.28.08Medicare Part D Take Up and Changes in Out-of-Pocket Prescription 4.28.08
Medicare Part D Take Up and Changes in Out-of-Pocket Prescription 4.28.08
 
New study supports notion of skewed opioid prescribing
New study supports notion of skewed opioid prescribingNew study supports notion of skewed opioid prescribing
New study supports notion of skewed opioid prescribing
 
Multi scale, integrative data analytics in pharmacogenomics
Multi scale, integrative data analytics in pharmacogenomics Multi scale, integrative data analytics in pharmacogenomics
Multi scale, integrative data analytics in pharmacogenomics
 
Pre-clinical drug prioritization via prognosis-guided genetic interaction net...
Pre-clinical drug prioritization via prognosis-guided genetic interaction net...Pre-clinical drug prioritization via prognosis-guided genetic interaction net...
Pre-clinical drug prioritization via prognosis-guided genetic interaction net...
 
369-1070-3-PB
369-1070-3-PB369-1070-3-PB
369-1070-3-PB
 
MEDICARE HEALTHCARE CHARGE DISPARITY ANALYSIS
MEDICARE HEALTHCARE CHARGE DISPARITY ANALYSISMEDICARE HEALTHCARE CHARGE DISPARITY ANALYSIS
MEDICARE HEALTHCARE CHARGE DISPARITY ANALYSIS
 
GlaxoSmithKline to plead guilty, pay $3B for illicit promotion of drugs in re...
GlaxoSmithKline to plead guilty, pay $3B for illicit promotion of drugs in re...GlaxoSmithKline to plead guilty, pay $3B for illicit promotion of drugs in re...
GlaxoSmithKline to plead guilty, pay $3B for illicit promotion of drugs in re...
 
BMI Research in Progress - Thursday talk
BMI Research in Progress - Thursday talkBMI Research in Progress - Thursday talk
BMI Research in Progress - Thursday talk
 
Nick Tatonetti's presentation on Systems Pharmacology at AMIA 2015
Nick Tatonetti's presentation on Systems Pharmacology at AMIA 2015Nick Tatonetti's presentation on Systems Pharmacology at AMIA 2015
Nick Tatonetti's presentation on Systems Pharmacology at AMIA 2015
 
MDC Connects: Target discovery at AstraZeneca
MDC Connects: Target discovery at AstraZenecaMDC Connects: Target discovery at AstraZeneca
MDC Connects: Target discovery at AstraZeneca
 
Personalized medicine
Personalized medicinePersonalized medicine
Personalized medicine
 
2013-04-17: The Promise, Current State, And Future of Personalized Medicine
2013-04-17: The Promise, Current State, And Future of Personalized Medicine2013-04-17: The Promise, Current State, And Future of Personalized Medicine
2013-04-17: The Promise, Current State, And Future of Personalized Medicine
 
Preproposal Talk
Preproposal TalkPreproposal Talk
Preproposal Talk
 
Next-Generation Sequencing Analysis in VSClinical
Next-Generation Sequencing Analysis in VSClinicalNext-Generation Sequencing Analysis in VSClinical
Next-Generation Sequencing Analysis in VSClinical
 

Similar to yyoneoka_AMIA_LinkKing_2009

Collaborative Drug Discovery: A Platform For Transforming Neglected Disease R...
Collaborative Drug Discovery: A Platform For Transforming Neglected Disease R...Collaborative Drug Discovery: A Platform For Transforming Neglected Disease R...
Collaborative Drug Discovery: A Platform For Transforming Neglected Disease R...Sean Ekins
 
3.7.2 Smart Verbal Autopsy (SmartVA) in the Philippines .pdf
3.7.2 Smart Verbal Autopsy (SmartVA) in the Philippines .pdf3.7.2 Smart Verbal Autopsy (SmartVA) in the Philippines .pdf
3.7.2 Smart Verbal Autopsy (SmartVA) in the Philippines .pdfArielGNocos
 
Piloting a Comprehensive Knowledge Base for Pharmacovigilance Using Standardi...
Piloting a Comprehensive Knowledge Base for Pharmacovigilance Using Standardi...Piloting a Comprehensive Knowledge Base for Pharmacovigilance Using Standardi...
Piloting a Comprehensive Knowledge Base for Pharmacovigilance Using Standardi...Richard Boyce, PhD
 
A Report of the Alopecia Areata Registry, Biobank and Clinical Trials Network...
A Report of the Alopecia Areata Registry, Biobank and Clinical Trials Network...A Report of the Alopecia Areata Registry, Biobank and Clinical Trials Network...
A Report of the Alopecia Areata Registry, Biobank and Clinical Trials Network...National Alopecia Areata Foundation
 
Possible Solution for Managing the Worlds Personal Genetic Data - DNA Guide, ...
Possible Solution for Managing the Worlds Personal Genetic Data - DNA Guide, ...Possible Solution for Managing the Worlds Personal Genetic Data - DNA Guide, ...
Possible Solution for Managing the Worlds Personal Genetic Data - DNA Guide, ...DNA Compass
 
Provenance-Centered Dataset of Drug-Drug Interactions
Provenance-Centered Dataset of Drug-Drug InteractionsProvenance-Centered Dataset of Drug-Drug Interactions
Provenance-Centered Dataset of Drug-Drug Interactionsjmbanda
 
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020dkNET
 
Trisomy 13-PAS-E07
Trisomy 13-PAS-E07Trisomy 13-PAS-E07
Trisomy 13-PAS-E07Jiaqi Hu
 
Using GENMOD to Analyze Corrlated Data with MHS beneficiaries_2016 SAS GLOBAL...
Using GENMOD to Analyze Corrlated Data with MHS beneficiaries_2016 SAS GLOBAL...Using GENMOD to Analyze Corrlated Data with MHS beneficiaries_2016 SAS GLOBAL...
Using GENMOD to Analyze Corrlated Data with MHS beneficiaries_2016 SAS GLOBAL...Nikki R. Wooten, PhD, LISW-CP
 
Predicting disease from several symptoms using machine learning approach.
Predicting disease from several symptoms using machine learning approach.Predicting disease from several symptoms using machine learning approach.
Predicting disease from several symptoms using machine learning approach.IRJET Journal
 
Mining of Important Informative Genes and Classifier Construction for Cancer ...
Mining of Important Informative Genes and Classifier Construction for Cancer ...Mining of Important Informative Genes and Classifier Construction for Cancer ...
Mining of Important Informative Genes and Classifier Construction for Cancer ...ijsc
 
MINING OF IMPORTANT INFORMATIVE GENES AND CLASSIFIER CONSTRUCTION FOR CANCER ...
MINING OF IMPORTANT INFORMATIVE GENES AND CLASSIFIER CONSTRUCTION FOR CANCER ...MINING OF IMPORTANT INFORMATIVE GENES AND CLASSIFIER CONSTRUCTION FOR CANCER ...
MINING OF IMPORTANT INFORMATIVE GENES AND CLASSIFIER CONSTRUCTION FOR CANCER ...ijsc
 
2010StanfordE25 Michele dragoescu e25 project
2010StanfordE25 Michele dragoescu e25 project2010StanfordE25 Michele dragoescu e25 project
2010StanfordE25 Michele dragoescu e25 projectmdragoescu
 
Medical Documentation Challenges, Controversies, and Trends
Medical Documentation Challenges, Controversies, and TrendsMedical Documentation Challenges, Controversies, and Trends
Medical Documentation Challenges, Controversies, and TrendsRobert Robinson
 
Pushing the Innovation Envelope: Strategies for Boosting Productivity and ROI...
Pushing the Innovation Envelope: Strategies for Boosting Productivity and ROI...Pushing the Innovation Envelope: Strategies for Boosting Productivity and ROI...
Pushing the Innovation Envelope: Strategies for Boosting Productivity and ROI...Life Sciences Network marcus evans
 
EFFICIENT INTEGRATION OF TARGET PATIENT DATA WITH DNA SEQUENCES AND STRUCTURE...
EFFICIENT INTEGRATION OF TARGET PATIENT DATA WITH DNA SEQUENCES AND STRUCTURE...EFFICIENT INTEGRATION OF TARGET PATIENT DATA WITH DNA SEQUENCES AND STRUCTURE...
EFFICIENT INTEGRATION OF TARGET PATIENT DATA WITH DNA SEQUENCES AND STRUCTURE...IAEME Publication
 
Comparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning andComparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning andAlexander Decker
 
Comparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning andComparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning andAlexander Decker
 

Similar to yyoneoka_AMIA_LinkKing_2009 (20)

Collaborative Drug Discovery: A Platform For Transforming Neglected Disease R...
Collaborative Drug Discovery: A Platform For Transforming Neglected Disease R...Collaborative Drug Discovery: A Platform For Transforming Neglected Disease R...
Collaborative Drug Discovery: A Platform For Transforming Neglected Disease R...
 
3.7.2 Smart Verbal Autopsy (SmartVA) in the Philippines .pdf
3.7.2 Smart Verbal Autopsy (SmartVA) in the Philippines .pdf3.7.2 Smart Verbal Autopsy (SmartVA) in the Philippines .pdf
3.7.2 Smart Verbal Autopsy (SmartVA) in the Philippines .pdf
 
Piloting a Comprehensive Knowledge Base for Pharmacovigilance Using Standardi...
Piloting a Comprehensive Knowledge Base for Pharmacovigilance Using Standardi...Piloting a Comprehensive Knowledge Base for Pharmacovigilance Using Standardi...
Piloting a Comprehensive Knowledge Base for Pharmacovigilance Using Standardi...
 
A Report of the Alopecia Areata Registry, Biobank and Clinical Trials Network...
A Report of the Alopecia Areata Registry, Biobank and Clinical Trials Network...A Report of the Alopecia Areata Registry, Biobank and Clinical Trials Network...
A Report of the Alopecia Areata Registry, Biobank and Clinical Trials Network...
 
Possible Solution for Managing the Worlds Personal Genetic Data - DNA Guide, ...
Possible Solution for Managing the Worlds Personal Genetic Data - DNA Guide, ...Possible Solution for Managing the Worlds Personal Genetic Data - DNA Guide, ...
Possible Solution for Managing the Worlds Personal Genetic Data - DNA Guide, ...
 
Provenance-Centered Dataset of Drug-Drug Interactions
Provenance-Centered Dataset of Drug-Drug InteractionsProvenance-Centered Dataset of Drug-Drug Interactions
Provenance-Centered Dataset of Drug-Drug Interactions
 
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020
 
DTC Testing
DTC TestingDTC Testing
DTC Testing
 
Trisomy 13-PAS-E07
Trisomy 13-PAS-E07Trisomy 13-PAS-E07
Trisomy 13-PAS-E07
 
Using GENMOD to Analyze Corrlated Data with MHS beneficiaries_2016 SAS GLOBAL...
Using GENMOD to Analyze Corrlated Data with MHS beneficiaries_2016 SAS GLOBAL...Using GENMOD to Analyze Corrlated Data with MHS beneficiaries_2016 SAS GLOBAL...
Using GENMOD to Analyze Corrlated Data with MHS beneficiaries_2016 SAS GLOBAL...
 
Predicting disease from several symptoms using machine learning approach.
Predicting disease from several symptoms using machine learning approach.Predicting disease from several symptoms using machine learning approach.
Predicting disease from several symptoms using machine learning approach.
 
Mining of Important Informative Genes and Classifier Construction for Cancer ...
Mining of Important Informative Genes and Classifier Construction for Cancer ...Mining of Important Informative Genes and Classifier Construction for Cancer ...
Mining of Important Informative Genes and Classifier Construction for Cancer ...
 
MINING OF IMPORTANT INFORMATIVE GENES AND CLASSIFIER CONSTRUCTION FOR CANCER ...
MINING OF IMPORTANT INFORMATIVE GENES AND CLASSIFIER CONSTRUCTION FOR CANCER ...MINING OF IMPORTANT INFORMATIVE GENES AND CLASSIFIER CONSTRUCTION FOR CANCER ...
MINING OF IMPORTANT INFORMATIVE GENES AND CLASSIFIER CONSTRUCTION FOR CANCER ...
 
2010StanfordE25 Michele dragoescu e25 project
2010StanfordE25 Michele dragoescu e25 project2010StanfordE25 Michele dragoescu e25 project
2010StanfordE25 Michele dragoescu e25 project
 
Medical Documentation Challenges, Controversies, and Trends
Medical Documentation Challenges, Controversies, and TrendsMedical Documentation Challenges, Controversies, and Trends
Medical Documentation Challenges, Controversies, and Trends
 
19 Electronic Medical Records
19 Electronic Medical Records19 Electronic Medical Records
19 Electronic Medical Records
 
Pushing the Innovation Envelope: Strategies for Boosting Productivity and ROI...
Pushing the Innovation Envelope: Strategies for Boosting Productivity and ROI...Pushing the Innovation Envelope: Strategies for Boosting Productivity and ROI...
Pushing the Innovation Envelope: Strategies for Boosting Productivity and ROI...
 
EFFICIENT INTEGRATION OF TARGET PATIENT DATA WITH DNA SEQUENCES AND STRUCTURE...
EFFICIENT INTEGRATION OF TARGET PATIENT DATA WITH DNA SEQUENCES AND STRUCTURE...EFFICIENT INTEGRATION OF TARGET PATIENT DATA WITH DNA SEQUENCES AND STRUCTURE...
EFFICIENT INTEGRATION OF TARGET PATIENT DATA WITH DNA SEQUENCES AND STRUCTURE...
 
Comparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning andComparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning and
 
Comparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning andComparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning and
 

More from Yukiko Yoneoka

Trauma-prehospital data matching for nasemso project Utah's method
Trauma-prehospital data matching for nasemso project Utah's methodTrauma-prehospital data matching for nasemso project Utah's method
Trauma-prehospital data matching for nasemso project Utah's methodYukiko Yoneoka
 
Utah geriatric hip fracture and in-hospital death
Utah geriatric hip fracture and in-hospital deathUtah geriatric hip fracture and in-hospital death
Utah geriatric hip fracture and in-hospital deathYukiko Yoneoka
 
Utah geriatric head injury and in-hospital death
Utah geriatric head injury and in-hospital death Utah geriatric head injury and in-hospital death
Utah geriatric head injury and in-hospital death Yukiko Yoneoka
 
Using linked trauma and prehospital data to improve data quality and analysis...
Using linked trauma and prehospital data to improve data quality and analysis...Using linked trauma and prehospital data to improve data quality and analysis...
Using linked trauma and prehospital data to improve data quality and analysis...Yukiko Yoneoka
 
Utah trauma registrar survey 2018
Utah trauma registrar survey 2018Utah trauma registrar survey 2018
Utah trauma registrar survey 2018Yukiko Yoneoka
 
2012-2016 Utah Pediatric emergency care status state & regional
2012-2016 Utah Pediatric emergency care status state & regional 2012-2016 Utah Pediatric emergency care status state & regional
2012-2016 Utah Pediatric emergency care status state & regional Yukiko Yoneoka
 
yyoneoka_UtahUsesGIS_2009
yyoneoka_UtahUsesGIS_2009yyoneoka_UtahUsesGIS_2009
yyoneoka_UtahUsesGIS_2009Yukiko Yoneoka
 
yyoneoka_missedopHPV_2016
yyoneoka_missedopHPV_2016yyoneoka_missedopHPV_2016
yyoneoka_missedopHPV_2016Yukiko Yoneoka
 
yyoneoka_birthorderstudy_2008
yyoneoka_birthorderstudy_2008yyoneoka_birthorderstudy_2008
yyoneoka_birthorderstudy_2008Yukiko Yoneoka
 
yyoneoka_AIRA_marginaldata_2016
yyoneoka_AIRA_marginaldata_2016yyoneoka_AIRA_marginaldata_2016
yyoneoka_AIRA_marginaldata_2016Yukiko Yoneoka
 

More from Yukiko Yoneoka (10)

Trauma-prehospital data matching for nasemso project Utah's method
Trauma-prehospital data matching for nasemso project Utah's methodTrauma-prehospital data matching for nasemso project Utah's method
Trauma-prehospital data matching for nasemso project Utah's method
 
Utah geriatric hip fracture and in-hospital death
Utah geriatric hip fracture and in-hospital deathUtah geriatric hip fracture and in-hospital death
Utah geriatric hip fracture and in-hospital death
 
Utah geriatric head injury and in-hospital death
Utah geriatric head injury and in-hospital death Utah geriatric head injury and in-hospital death
Utah geriatric head injury and in-hospital death
 
Using linked trauma and prehospital data to improve data quality and analysis...
Using linked trauma and prehospital data to improve data quality and analysis...Using linked trauma and prehospital data to improve data quality and analysis...
Using linked trauma and prehospital data to improve data quality and analysis...
 
Utah trauma registrar survey 2018
Utah trauma registrar survey 2018Utah trauma registrar survey 2018
Utah trauma registrar survey 2018
 
2012-2016 Utah Pediatric emergency care status state & regional
2012-2016 Utah Pediatric emergency care status state & regional 2012-2016 Utah Pediatric emergency care status state & regional
2012-2016 Utah Pediatric emergency care status state & regional
 
yyoneoka_UtahUsesGIS_2009
yyoneoka_UtahUsesGIS_2009yyoneoka_UtahUsesGIS_2009
yyoneoka_UtahUsesGIS_2009
 
yyoneoka_missedopHPV_2016
yyoneoka_missedopHPV_2016yyoneoka_missedopHPV_2016
yyoneoka_missedopHPV_2016
 
yyoneoka_birthorderstudy_2008
yyoneoka_birthorderstudy_2008yyoneoka_birthorderstudy_2008
yyoneoka_birthorderstudy_2008
 
yyoneoka_AIRA_marginaldata_2016
yyoneoka_AIRA_marginaldata_2016yyoneoka_AIRA_marginaldata_2016
yyoneoka_AIRA_marginaldata_2016
 

yyoneoka_AMIA_LinkKing_2009

  • 1. 1 S 1 1 S S 2 f 1 Turning Junk Data into Value Using 9-digit Mixed Identifiers to Enhance Linkage Results for Utah Controlled Substance Database Yukiko Yoneoka1, MS, Wu Xu1, PhD , Zhiwei Liu1, MS, Brian Sauer2, PhD, Robert Rolfs1, MD 1Utah Department of Health and 2University of Utah, Salt Lake City, Utah Methods Data Background Public Health Problem How Using 9-digit Mixed ID Enhanced Linkage Results Result CSDB 2006-2007 contained 9,342,994 prescription records. Of those, 6,212 records (0.07%) from veterinarian or other non medical prescribers were excluded from analysis. A total of 9,336,782 prescription records were used. The 9-digit Mixed ID Public Health Problem In the past seven years, the state of Utah has experienced increasing numbers of unintentional deaths due to prescription pain medication overdoses. Controlled Substance Database (CSDB) Pharmacies submit controlled substance Certainty of Match Found by The Link King 465237 369940 edPairs Method of Linking Used by The Link King 434329 369088 hedPairs The 9-digit mixed ID was populated with customer ID content, using: 1) Length of 9 after stripping off attached characters (e.g., SSN, SS#, UTDL, UDL); 2) Drivers license number validation by an algorithm that assigns Utah 9-digit drivers license numbers (9, 8, 7, 6, 5, 4, 3, 2, 1) • (d1 d2 d3 d4 d5 d6 d7 d8 d9) ≡0mod10; dispensing records with patient information to the CSDB without strict data standards. As a result, considerable variability is found in required patient information fields. Challenge: Customer ID Since standardization is not enforced for the customer ID field it contains different 37013056 676 107309 Level 1: Highest possible Level 2: Very High Level 3: High L l f C t i t NumberofMatch 97.2% 77.4% 2.7% 22.5% <1 % <1 % 108756 34565 976981 Both Det. & Prob. Probablintic Only Deterministic Only M th d f Li ki NumberofMatch <1% 90.7% 77.2% 7.2% 2% 22.8% (d1, d2, d3, d4, d5, d6, d7, d8, d9) ≡0mod10; and, 3) Valid range of first 3 digits of SSN. About 35.5% (3,313,731) of all prescription records carried some type of 9-digit ID. The distribution of content of the 9-digit mixed ID is indicated in the table below. the customer ID field, it contains different types of IDs such as pharmacy specific ID, SSN, drivers license number, passport number, phone number, names and other text. This inconsistency makes it challenging for researchers to accurately construct patient-centered prescription records across pharmacy records. About 20% more matched pairs were found with highest possible level of certainty by using 9-digit mixed IDs. About 14% more matches were found by both deterministic and probabilistic linking methods and 7% more by probabilistic method by using 9-digit mixed IDs. Without, it heavily depended on deterministic method only Break down of 9-digit mixed ID Number (%) Level of Certainty With 9-digit Mixed ID Without 9-digit Mixed ID Method of Linking With 9-digit Mixed ID Without 9-digit Mixed ID Linkage The Link King© v.7, a free SAS based linkage software was p y Objective The purpose of this study was to salvage and utilize all 9-digit ID numbers from the ID variable. We then examined how adding the 9-digit mixed ID as a linkage variable affected linkage results. Major blocking criteria where match was found by The Link King With 9-digit Mixed ID Without 9-digit Mixed ID 9-digit Mixed ID only 60,031 (12.5%) N/A Last Name & DOB 407,105 (85%) 464,732 (97.2%) deterministic method only. g ( ) SSN 1,108,388 (33.4) UTDL 1,070,071 (32.3) Other 9-digit ID 1,135,272 (34.3) Total 3,313,731(100.0) Acknowledgements This study is supported by CDC Grant No. P01 CD000284-03, P.I. Matthew Samore, Utah Research Center for Excellence in linkage software, was used on a platform of SAS v.9.1.3 . Variables used for linkage were: First, middle and last names; date of birth (DOB); gender; zip code; and, 9-digit First, Middle and Last Names only 7,879 (1.7%) 9,613 (2%) First Name & DOB 1,941 (0.4%) 1,934 (0.4%) First and Last Names & Birth Year 1,028 (0.2%) 990 (0.2%) First and Last Names & Birth Month 575 (0.1%) 565 (0.1%) In blocking process, about 13% of all matches were found by 9-digit mixed ID match alone. Public Health Informatics. Many thanks to Nancy McConnell for her valuable suggestions. Contact Yukiko Yoneoka y.yoneoka@utah.gov 2009 AMIA Spring Congress, Orlando FL (May 28 - May30, 2009) mixed ID. The data was linked first with the 9-digit mixed ID then without. The results were compared based on The Link King’s result reports. Conclusion Retrieving 9-digit IDs from a mix of data collected in a customer ID field to create a mixed ID field as a linkage variable would be a worthwhile practice, considering the enhanced quality of the linkage results.