Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

PEDSnet DQA CHOP Symposium


Published on

Poster: Data Quality Assessment in PEDSnet

Published in: Science
  • Be the first to comment

  • Be the first to like this

PEDSnet DQA CHOP Symposium

  1. 1. What  is  PEDSnet?                 The  prominent  considera0ons  in  building  PEDSnet  include:     •  lack  of  EHR  data’s  orienta0on  toward  research  analy0cs:  unstructured  and  locally   coded  data,  complex  clinical  workflows   •  seman0c  heterogeneity:  diverse  source  systems  (5  Epic,  2  Cerner,  1  AllScripts)  and   vocabularies  (ICD9,  RxNorm,  GPI,  CPT-­‐4,  etc.)   •  data  peculiari0es  in  pediatrics:  procedures  conducted  before  a  child  is  born  such  as   prenatal  surgery  or  tes0ng   With   the   ul0mate   goal   of   serving   a   variety   of   research   projects,   a   prerequisite   in   PEDSnet  is  to  ensure  that  the  network’s  data  is  “high  quality.”  Major  gaps  in  this  area   include   the   lack   of   a   clear   defini0on   of   “data   quality”   in   CDRNs   and   the   lack   of   standardized   guidelines   for   conduc0ng   data   quality   assessments   in   large-­‐scale   distributed  networks.                 We   implement   a   comprehensive   suite   of   data   quality   checks,   and   propose   a   semi-­‐ automa0c   workflow   to   conduct   data   quality   assessment   in   PEDSnet,   and   make   the   following  contribu0ons:   Data  Integra2on  and  Valida2on  in  PEDSnet   Fig  1  PEDSnet  Partner  Sites  (Total  8)   1.  L a r g e -­‐ s c a l e   d a t a   q u a l i t y   assessments  in  pediatrics   ü  ~4.7M   children   (nearly   5%   of   US   child  popula2on)   ü  Observa2onal  data  associated  with   >97M  clinical  encounters   Our  Contribu2ons   2.   Providing   a   real-­‐world   account   on   data   quality  issues  in  a  CDRN   ü  Iden2fied  and  tracked  the  causes  of  ~600   issues     ü  Longitudinal   analysis   showing   trends   of   causes  and  domains   Collabora0ons  across  mul0ple  ins0tu0ons  are   essen0al  to  achieve  adequate  cohort  sizes  in   pediatrics   research.   PEDSnet   is   a   newly   established   clinical   data   research   network   (CDRN)   that   aggregates   electronic   health   record   (EHR)   data   from   eight   of   the   na0on’s   largest  children’s  hospitals.  PEDSnet  is  part  of   a   much   larger   network,   PCORnet,   supported   by   the   Pa0ent-­‐Centered   Outcomes   Research   Ins0tute  (PCORI).       The  goal  of  PEDSnet  is  to  support  a  variety  of   research  projects:   •  Compara0ve  effec0veness  studies     •  Drug  safety  studies     •  Descrip0ve  studies     •  Computable  phenotypes   Identifying and Understanding Data Quality Issues in a Pediatric Distributed Research Network Ritu Khare1, Levon Utidjian1, Evanette Burrows1, Greg Schulte2, Kevin Murphy1, Sara Deakyne2, Richard Hoyt3, Nandan Patibandla4, Byron Ruth1, Aaron Browne1, Megan Reynolds3, Keith Marsolo5, Michael Kahn2, Charles Bailey1 1Children’s Hospital of Philadelphia, 2Children’s Hospital Colorado,3Nationwide Children’s Hospital, 4Boston Children’s Hospital, 5Cincinnati Children’s Hospital Medical Center PEDSnet  Data  Quality  Assessment  Workflow   Results   Data  Quality   Assessment   Program   Issue  Ranking   and   Adjudica2on     S1   Fig  2.  A  typical  data  cycle  in  PEDSnet  showing  communica0on  between  the  Data  Coordina0ng  Center  and  the  Partner  Sites   Key  Findings  &  Conclusions   Even  a`er  the  4th  data  cycle,   on  an  average  74  issues   iden0fied  per  site.     Prospec0ve  data  quality   assurance  is  important  but   not  sufficient  for  valida0ng   data  quality  in  CDRNs.   The  percentage  of  ETL  issues   decreased  across  successive   data  cycles.     A  strong  post  hoc  data   quality  assessment  program   is  necessary  to  ensure  data   quality  in  CDRNs.   The  total  number  of   provenance  issues  increased   across  cycles.   This  study  serves  as  a  data   educa0on  pladorm  where   the  experts  learn  and  evolve   the  data  quality  program   with  each  itera0on.     Related  Publica2on   Iden2fying  and  Understanding  Data  Quality  Issues  in  a  Pediatric  Distributed  Research  Network   Ritu  Khare;  Levon  U1djian;  Gregory  Schulte;  Keith  Marsolo;  Charles  Bailey   AMIA  Annual  Symposium  2015,  November  14-­‐18  2015,  San  Francisco,  CA   Future  Direc2ons   Implement  advanced  data  quality  checks,  e.g.   comparison  with  external  datasets  and  published   results   Use  the  data  quality  results  to  determine  the  analy0c   fitness  of  the  PEDSnet  dataset  with  respect  to  a  given   target  research  study   PEDSnet  Sites  (at  various  loca1ons)   1.  Shared  private  repository  for   extract-­‐transform-­‐load  (ETL)   scripts   2.  ETL  conven0ons  document     3.  Weekly  troubleshoo0ng  mee0ngs     Acknowledgments   This  work  was  supported  by  PCORI  Contract  CDRN-­‐1306-­‐01556.     The   inves0gators   would   like   to   thank   the   PEDSnet   informa0cs   team   for   their   efforts  ,  and  the  pa0ents  and  families  contribu0ng  to  the  PEDSnet  dataset.     Most  Frequent  Issues   Missing  Data   31%   En0ty  Outliers   15%   Unexpected  difference   from  previous  ETL  run   9%   Implausible  Dates   9%   Temporal  Outliers   6%   Unexpected  Facts   4%   Unexpected  concept   iden0fiers   3%   Numerical  Outliers   3%   u ETL:  due  to  an  ETL  programming  error  at  the  site  largely  due   to  incompleteness  or  ambiguity  in  the  conven0ons  document   u Provenance:  due  to  the  nature  of  EHR  data,  e.g.  data  entry   error,  clinical  &  administra0ve  workflows,  true  anomaly,  etc.   u I2b2  transform:  due  to  limita0ons  of  i2b2-­‐>PEDSnet   transforma0on  scripts   u Non-­‐issue:  false  alert  by  the  data  quality  assessment  program   Longitudinal  Analysis  of  Issues  Summary  of  Data  Quality  Issues   Contact  Person   Ritu  Khare,  PhD  (   PEDSnet  Data  Coordina2ng  Center    Prospec2ve  Data  Quality  Assurance   Aggregated   (OMOP  CDM)   S2   S3   S4   S5   S6   S7   S8   Types  of  Data  Quality  Checks  implemented  in  the  Program   Fidelity   the  degree  to  which  PEDSnet  data  accurately  reflect   the  source  systems  from  which  it  is  drawn     e.g.  the  distribu0ons  of  gender  values  do  not  match  between   the  source  system  (EHR)  and  the  derived  PEDSnet  dataset.     Consistency       the  degree  to  which  a  specific  type  of  informa0on  is   recorded  in  the  same  way  in  the  different  data  sources   contribu0ng  to  PEDSnet   Value  set  viola0ons  (e.g.  incorrect  concept  iden0fier  used  to   represent  no  informa0on  in  race  field)  or  mapping  issues  (e.g.  a   site  using  internal  codes  to  capture  drug  informa0on  cannot   completely  map  to  standard  RxNorm  codes)   Accuracy   the  degree  to  which  PEDSnet  data  correctly  reflect  the   clinical  characteris0cs  of  pa0ents     e.g.  a  site  having  significantly  larger  ra0o  for  average  number  of   facts  (observa0ons,    procedures,  or  drug  exposures)  per  pa0ent     Feasibility   the  degree  to  which  a  given  type  of  informa0on  is   actually  collected  and  available  in  PEDSnet     e.g.  the  recorded  gesta0onal  age  is  missing  for  70%  of  pa0ents   Expert  Review   Communica0on  of  issues  to  the  origina0ng  sites   At  the  end  of  the  4th  data  cycle,  the  data  quality  assessment   program  helped  iden0fy  591  data  quality  issues.   Domains  with  Most  Issues   Drug  Exposure   21%   Measurement   13%   Observa0on   12%   Condi0on  Occurrence   11%   Procedure  Occurrence     10%   Person   8%   Visit  Occurrence   7%   0   10   20   30   40   50   60   Jan  '15   (OMOP  v4)   Mar  '15   (OMOP  v4)   May  '15   (OMOP  v5)   Jul  '15   (OMOP  v5)   Percentage  of  Issues   Data  Cycles   Distribu2on  of  Issue  Causes  across  Data  Cycles   PCORnet     CDM   Final  Target