Poster presented at SAPC 2013 at SAPC conference, Nottingham on 3/07/2013. David A. Springate, Evangelos Kontopantelis, David Reeves Primary Care Research group, Centre for Biostatistics, University of Manchester Publications in healthcare research using electronic medical records (EMR) databases are increasing at an exponential rate. Because of the amount of data available in large EMR databases, it has been suggested that they may be able to provide results of equal validity to randomised controlled trials. EMR studies rely on clinical codes (such as Read codes) to provide standardised and expressive means for medical professionals to record clinical information. The validity of these studies is dependent on (among other things) the validity of the clinical codes that are used to define the population of interest and their disease conditions. Clinical codes should be held to scrutiny in the same way as other methods since if the inclusion/exclusion criteria for a given condition is invalid then so will be the rest of the study. Also, it should be possible to replicate a given study (e.g. in a different EMR database) based on the information provided in the original paper, not possible if the lists of clinical code definitions are not provided. Furthermore, access to historical code-lists allows researchers and clinicians to make incremental improvements to disease and other definitions, building on and avoiding unnecessary replication of previous work There is currently no obligation to publish clinical code lists and no centralised repository to hold them. Consequently, the vast majority of database studies do not publish their clinical codes and as such are impossible to be fully validated or replicated. To illustrate this, we looked at 45 UK case-control EMR database studies indexed on PubMed and found that only five had any record of any clinical codes in their methodology sections. Of these five, only two published code lists in online appendices and only one provided a full set of codes that would allow for proper replication of the study. We have built an online repository where researchers can deposit their clinical codes at the time of publication in a standardised way, as well as download historical code lists from previous studies. We have uploaded a complete set of Read codes for all versions of the Quality and Outcomes Framework and encourage all code lists published by major medical organisations to be deposited. Reproducibility and validity of EMR database studies would be greatly aided if deposition of all clinical codes was a prerequisite for publication of all future database studies. The ability to build on code lists from historical studies during the development of new code lists will also ease a considerable bottleneck in database study design, removing the need for a huge deal of “reinventing of the wheel” each time a new EMR-based study is undertaken.