Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Understanding the gaps between Data Quality
Checks and Research Capabilities in a Pediatric
Data Research Network
Ritu Kha...
2
PEDSnet CDRN = 5.7M patients in pediatrics
Phase 2 (9m): Conduct
Science Queries
Phase 1 (18m): Initial
Infrastructure
Data Quality Assessment in PEDSnet
• Data ready for research use???
• PEDSnet data quality workflow
• Design data quality ...
PEDSnet Phase 1: Data Quality Assessment
0
100
200
300
400
500
600
700
1 2 3 4 5 6 7 8 9
#DataQualityChecks
Data CycleJan ...
PEDSnet Phase 2: Conducting Science Queries
• >30 scientific studies: Computable phenotypes,
feasibility queries, associat...
PEDSnet Phase 2: Data Quality Assessment
• USER-DRIVEN: >75 new issues, and 8 new check types
Check Type Issue Example
Out...
PEDSnet Phase 2: Data Quality Assessment
Check Type Issue Example
Unexpected Facts Procedures recorded in the condition ta...
PEDSnet Phase 2: Check Design Challenges
• Determine the combination of fields / tables
• ~100 fields in PEDSnet data mode...
Conclusions
• A new (user-driven) perspective on data quality
• Usability evaluation of PEDSnet data quality
assessment pr...
Acknowledgments
• PEDSnet Teams
• Leadership and governance
• Informatics
• Pilot studies
• PCORnet Governance Committees ...
Upcoming SlideShare
Loading in …5
×

Understanding Gaps between Data Quality Checks and Research Capabilities in a Pediatric CDRN

31 views

Published on

Presentation at AMIA Joint Summits 2017

Published in: Science
  • Be the first to comment

  • Be the first to like this

Understanding Gaps between Data Quality Checks and Research Capabilities in a Pediatric CDRN

  1. 1. Understanding the gaps between Data Quality Checks and Research Capabilities in a Pediatric Data Research Network Ritu Khare, Hanieh Razzaghi, Levon Utidjian, Matthew Miller, L. Charles Bailey The Children’s Hospital of Philadelphia
  2. 2. 2 PEDSnet CDRN = 5.7M patients in pediatrics Phase 2 (9m): Conduct Science Queries Phase 1 (18m): Initial Infrastructure
  3. 3. Data Quality Assessment in PEDSnet • Data ready for research use??? • PEDSnet data quality workflow • Design data quality checks • https://github.com/PEDSnet/Data-Quality-Analysis • Identify data quality issues • Rate of extract-transform load (ETL) errors reduced from >50% to <10% (Khare et al., JAMIA in press) Type of Check Issue Example Missing data Gestational age missing for 70% of patients Invalid value Race outside the acceptable values in PEDSnet conventions Implausible event Encounter start date after the end date
  4. 4. PEDSnet Phase 1: Data Quality Assessment 0 100 200 300 400 500 600 700 1 2 3 4 5 6 7 8 9 #DataQualityChecks Data CycleJan 2015 May 2016 Frameworks, methods in literature (Brown et al. 2013, Weiskopf and Weng, 2013, Kahn et al. 2015) c THEORY-DRIVEN  50 members in informatics team  Data and issue review DEVELOPER- DRIVEN
  5. 5. PEDSnet Phase 2: Conducting Science Queries • >30 scientific studies: Computable phenotypes, feasibility queries, association studies, etc. Site % children with CT-scan during ED visits in 2013-2016 A 3.32% B 4.87% C 3.58% D 2.98% E 0.11% F 3.62% G 5.11% H 5.92% Incorrect mapping of CT-scan procedure Invalid coding of ED visits Bug in the query True anomaly
  6. 6. PEDSnet Phase 2: Data Quality Assessment • USER-DRIVEN: >75 new issues, and 8 new check types Check Type Issue Example Outliers in derived values Average length of inpatient stays Inconsistency between similar concepts captured in different tables Specialty data in provider vs. care_site tables Incorrect mapping from EHR to PEDSnet Mapping of labs to LOINC Missing Expected Facts GI Providers, creatinine labs, etc
  7. 7. PEDSnet Phase 2: Data Quality Assessment Check Type Issue Example Unexpected Facts Procedures recorded in the condition table Variability in coding Different concepts used to represent same lab or vitals Unexpected most frequent values “shooting pain” identified as top inpatient visit condition Face validity issues Tables with unexpectedly low number of records
  8. 8. PEDSnet Phase 2: Check Design Challenges • Determine the combination of fields / tables • ~100 fields in PEDSnet data model • Determination of outlier • Differentiate between true anomaly and real data quality issue • Determination of thresholds • Experimentation with datasets • Automatic review of ETL mappings • labs, organisms, specialty, route, race, ethnicity, drugs, language, procedure, smoking history • 1000s of manually derived mappings
  9. 9. Conclusions • A new (user-driven) perspective on data quality • Usability evaluation of PEDSnet data quality assessment program • 20% increase in types of checks • Future work • Investigate the Phase 2 check design challenges • Reverse engineering of checks from issues identified in science queries
  10. 10. Acknowledgments • PEDSnet Teams • Leadership and governance • Informatics • Pilot studies • PCORnet Governance Committees and DRN OC • OHDSI Consortium • Patients and Families • This work was supported by PCORI Contract CDRN-1306-01556. • PEDSnet Data Quality Scripts: https://github.com/PEDSnet/Data- Quality-Analysis

×