Databases of electronic medical records and in particular primary care databases (PCDs) are increasingly used in research. The largest PCDs contain full data on all primary care consultations by …
Databases of electronic medical records and in particular primary care databases (PCDs) are increasingly used in research. The largest PCDs contain full data on all primary care consultations by millions of patients over two or more decades. They provide a means for investigating important healthcare questions which cannot be practically addressed in a Randomised Controlled Trial. However, concerns remain about the validity of studies based on data from PCDs. Most work around validity has attempted to confirm individual data values within a dataset. We take a different approach and instead replicate published PCD studies in a second, independent, PCD. Agreement of results then implies that the conclusions drawn are independent of the data source (though this doesn’t rule out that such as confounding by indication are commonly influencing both).
We replicated two previous PCD studies using the Clinical Practice Research Datalink (CPRD). The first was a retrospective cohort study of the effect of Beta-blocker therapy on survival in cancer patients using DIN-LINK. The second was a nested case-control analysis of the effects of Statins on mortality of patients with ischaemic heart disease using QRESEARCH.
Our analyses produced several important quantitative differences compared to the original studies, altering conclusions. These could not be fully explained by either demographic differences in the patient samples or structural differences between the datasets. Our study highlights both the caution that needs to be applied when assessing the findings from analysis of just a single database and the difficulties in performing replications of existing PCD studies.