ClinicalCodes.org: An online
repository of clinical code lists for
primary care database research
David A. Springate, University of Manchester
Centres for Primary Care and Biostatistics
Outline
1. The Clinical code problem
2. www.ClinicalCodes.org
3. Motivations
Primary Care Database study popularity
Number of UK PCD
publications is rapidly
increasing
1990 1995 2000 2005 2010
050100150
PCD articles in pubmed
year
Numberofarticles
There is global interest in UK PCD
research
Institutions affiliated with UK PCD publications
xx
x x
x
xxxxxx
x x x
x
xx
x
x
x
xx
x
xx
x
xxx xx
xxx
x
xx
xxxxxx
xx
xx
x
xx
x
x
x xx x
x
x
xx
x
x
xxxxxx
x
x
x
x
x
x
xx
x xxx
x
xxxxx
xxx
xxx
x
x
x
x
xx
xxx
xx
x xx
x
x
xxx
xx
x
x
x
x
x
x
x
x
x
x
xx xx
x
xxx
x x
x
x
x
x
x
x
xx
xx
x
xx
xxxx
x
x
x
x
xx
x
xx
x
xxx
x xx
xx
xx
x
x
xx
x
xxxxxx x
x
x
xx
x
xxx
x
x
xxx
x
x
x
x
xxxxx
x
xx
x
xx
xxxxxx
xx
xx
x
x
x
x
xx x
xxx
x
x
xx
xx
x xxxxx
x
xxxxx
x
xx
xxx
x
x
x
xx
xx
xxx
x
x
x
x
xx
x
xx
xx
x
x
x
x
xx
xx
xxxxxx x
x
x
xx
x
x
x
x
x
x
x
xx
x x
x
x
x
xx
x
x
x
xxx
x
xxxxx
x
x
x xxxx
x x
xxxxx
xx
xx
x
x
xxxxxxxx xxx
x
xxxxx
x
x xx xxx x
x
xx xxxx
x
x
xxx
xx
x
xx
xxxxx
x
xx
x
x
xx
x
x
x
xx
x
x
x
xxx
x
xx
x
xxxx
xx
xxx
xx
x
xx x
xx
x
xxxx
xxx
x
x
xxx
x
x xxxx
x
x
x
x
x
x
x
x
x
x
x
x
x
x xxxxxx xxxx xx
x
xxx
x
x
x
x
x x
x x
xx
x
x
x
x
x
xxx
x
x
x
xx
x
xxx
x
x
x
x
x
x
x
xx x
x
x
x
x
x
xx
xx
xxxx
x
x
x
x
xxx
x
x
xx
xxx
x
x xxx
x
x
x
xx
xxxxxxxxx
xx
xx
x
xxxx
xx
x
xxxx
x
x
xx
x
x
x
x xx
x
xxx
x
xx
xxxxxx xx
x
xx
x
x
x
xxx
x
x
x
xxxxx
xx
xx
x
x
x
x
x
x
x
x
x x
x
xxxxx
x
xx xxx
x
xxx
x
x
x
x
x
x
x
xx
x
x
x
x
x
xxx
xx
x
xxxxx
x x
xx
xx
x
x
x
xxxxxxx
xx
x
x
xxxx
xx xx
x
x x x
xxxx
xx
xx
xxx
xxx
x xx
xx
x
xxx
x
x
x
x
x
xxx
x
x
x
x
xxxx
x
x
x x
xxxx
xxx
xxxxxxxx x
xx
xx
xx
x
xxxx x
x
x
xxxx
x
x
x
xx
xxxx
xx
x xx
xxx
xxx
x
x
x xxx
xxxx xxxx
x
xx
x
x
x
x
xx
x
x x
x
xx
xxx
x
x
x
x
x
x
x
x
xxxxxxxx
x
xxx
xx
x
xxx
x
xx xxxx
xx
x
xxxxxxxx xxxxx
x
xx
xx
x
xxxxxxx
x
x
xx
xxx
x
x xx
x
xx
xx
x
x
xx
x
x
x
xxx
x x
x
xxx
x
x
xx
xx
xx
xxx
x
x
x
xx x
x
xxx x
x
x x
xx
x
x
xxx
x
xx
xxxxxxxx
x x
x
x
x
x
xx
x
xxxxxx
x
x
xxxx
xxx
x
xxx
x
x
x
x xx
x
x
xx
x
x
x
x
x
x
x
x
x
x x
x
x x
xx x
xx
xx x
x
x
xx
x
x
x
xxx
x
xxx
x
xx
xx
x
x
x
x
x
xx
xx
xx
x
x
xx
x
xxxxxxx
x
xxxxxxxxxxx xxxxxxxxxxx
xxx x
x
x
xxxx
x
xxxx
xxxxxxxxxxxxxxxxxx xxxxx
Addressing concerns about the validity of
PCD-based studies. . .
Active areas of research:
• Data quality
• Data completeness
• Confounding
Addressing concerns about the validity of
PCD-based studies. . .
Active areas of research:
• Data quality
• Data completeness
• Confounding
But addressing these assumes that the underlying
definitions of clinical entities are valid!
Deciding on a code list. . .
Nicholson A, Ford E, Davies KA, Smith HE, Rait G, et al. (2013)
Optimising Use of Electronic Health Records to Describe the Presentation of Rheumatoid Arthritis in
Primary Care: A Strategy for Developing Code Lists.
PLoS ONE 8(2): e54878. doi:10.1371/journal.pone.0054878
Code list? What Code list?
• The vast majority of PCD studies do not publish their
codes
• Currently no obligation to publish code lists by
funding bodies, journals or databases
• No centralised repository for clinical codes
In 45 UK PCD
case-control studies
on diabetes:
In 45 UK PCD
case-control studies
on diabetes:
• Only 5 reported
ANY clinical codes
at all
• Only 2 of these
published codes in
appendix
• Only 1 provided
full set of code
lists
Potential cases of
diabetes were
identified using
predefined diabetes
codes and prescriptions
of oral anti-diabetics
and insulin
Cases with DM were
included in the
analyses if they had a
first-time DM code
recorded plus at least
one prescription for
an anti-diabetic drug
Using medical READ
codes, we identified
all subjects in the
GPRD who had a first-
time diagnosis of …
Code lists are not available. . . So
what?
Codes not subject to scrutiny or peer review
• No way of knowing if a
condition diagnosis is valid
• Clinical decisions based on
invalid condition definitions
(Even though the analysis is
rigorous)?
No way to replicate
research
“Non-reproducible single
occurrences are of no
significance to science.”
—– Karl Popper (1959)
”an experiment is
reproducable until
another laboratory tries
to repeat it.”
— Alexander Kohn
http://xkcd.com/242
Difficulties in comparing studies
• Definitions change over time
• GPs may change coding
practice in response to
regulations/incentives (e.g.
QOF)
• Different studies may use
different markers (test scores,
drugs, symptoms etc.)
Have to build new code lists for known
conditions from scratch
www.ClinicalCodes.org
ClinicalCodes.org
... an online repository for primary
care database researchers to upload
and download clinical code
definitions
• Deposit code lists upon
publication
• Download historical code lists
• Archive for all QOF business
rules from 2004
• Metadata
• Unique URI
ClinicalCodes.org
Codes can be hosted for
• Diagnoses
• Drug exposures
• Tests
• Procedures
• Outcomes
Different coding systems
• Read
• ICD9/10
• SNOMED
• ICPC
ClinicalCodes.org users
1. PCD clinical researchers
• Validaton of PCD studies
• Building on previous code lists
• Matching appropriate disease definitions in time
ClinicalCodes.org users
2. Informaticians / ‘meta-analysts’
• Study replications across databases
• Tracking changes in disease definitions and doctors’
coding practice though time
ClinicalCodes.org users
2. Informaticians / ‘meta-analysts’
• Study replications across databases
• Tracking changes in disease definitions and doctors’
coding practice though time
• Research objects
Bechhofer S, Buchan I, De Roure D, Missier P, Ainsworth J, Bhagat J, Couch P, Cruickshank D,
Delderfield M, Dunlop I, Gamble M, Michaelides D, Owen S, Newman D, Sufi S, Goble C. (2013)
Why linked data is not enough for scientists
Future Generation Computer Systems 29(2): 599-611.
http://dx.doi.org/10.1016/j.future.2011.08.004
Why would I want to upload my codes?
I’ve spent months
building these code lists
– I don’t want to give all
my good ideas away to
other groups for
nothing!
I am very busy and
I don’t have time to
upload my codes!
I will not download
codes so what’s the
benefit to me?
Publishing codes
will expose the
flaws in my
coding strategy!
Motivations
Motivations
• Upload is simple and
painless
• Faster and more
consistent development
of new code lists
• Exposure and potential
citations
• Improvements in
research quality
• A way of finding out
who is working in the
same area
Motivations
• Uploading should be
required by
– Journals
– Funding bodies
– Databases (ISAC)
• Movement towards open
data and greater
transparency in
publishing protocols
• Research without
accessible codelists is of
questionable validity...
Thank you!
Any questions?
david.springate@manchester.ac.uk
@medcodes
www.ClinicalCodes.org

ClinicalCodes.org: An online repository of clinical code lists for primary care database research

  • 1.
    ClinicalCodes.org: An online repositoryof clinical code lists for primary care database research David A. Springate, University of Manchester Centres for Primary Care and Biostatistics
  • 2.
    Outline 1. The Clinicalcode problem 2. www.ClinicalCodes.org 3. Motivations
  • 3.
    Primary Care Databasestudy popularity Number of UK PCD publications is rapidly increasing 1990 1995 2000 2005 2010 050100150 PCD articles in pubmed year Numberofarticles There is global interest in UK PCD research Institutions affiliated with UK PCD publications xx x x x xxxxxx x x x x xx x x x xx x xx x xxx xx xxx x xx xxxxxx xx xx x xx x x x xx x x x xx x x xxxxxx x x x x x x xx x xxx x xxxxx xxx xxx x x x x xx xxx xx x xx x x xxx xx x x x x x x x x x x xx xx x xxx x x x x x x x x xx xx x xx xxxx x x x x xx x xx x xxx x xx xx xx x x xx x xxxxxx x x x xx x xxx x x xxx x x x x xxxxx x xx x xx xxxxxx xx xx x x x x xx x xxx x x xx xx x xxxxx x xxxxx x xx xxx x x x xx xx xxx x x x x xx x xx xx x x x x xx xx xxxxxx x x x xx x x x x x x x xx x x x x x xx x x x xxx x xxxxx x x x xxxx x x xxxxx xx xx x x xxxxxxxx xxx x xxxxx x x xx xxx x x xx xxxx x x xxx xx x xx xxxxx x xx x x xx x x x xx x x x xxx x xx x xxxx xx xxx xx x xx x xx x xxxx xxx x x xxx x x xxxx x x x x x x x x x x x x x x xxxxxx xxxx xx x xxx x x x x x x x x xx x x x x x xxx x x x xx x xxx x x x x x x x xx x x x x x x xx xx xxxx x x x x xxx x x xx xxx x x xxx x x x xx xxxxxxxxx xx xx x xxxx xx x xxxx x x xx x x x x xx x xxx x xx xxxxxx xx x xx x x x xxx x x x xxxxx xx xx x x x x x x x x x x x xxxxx x xx xxx x xxx x x x x x x x xx x x x x x xxx xx x xxxxx x x xx xx x x x xxxxxxx xx x x xxxx xx xx x x x x xxxx xx xx xxx xxx x xx xx x xxx x x x x x xxx x x x x xxxx x x x x xxxx xxx xxxxxxxx x xx xx xx x xxxx x x x xxxx x x x xx xxxx xx x xx xxx xxx x x x xxx xxxx xxxx x xx x x x x xx x x x x xx xxx x x x x x x x x xxxxxxxx x xxx xx x xxx x xx xxxx xx x xxxxxxxx xxxxx x xx xx x xxxxxxx x x xx xxx x x xx x xx xx x x xx x x x xxx x x x xxx x x xx xx xx xxx x x x xx x x xxx x x x x xx x x xxx x xx xxxxxxxx x x x x x x xx x xxxxxx x x xxxx xxx x xxx x x x x xx x x xx x x x x x x x x x x x x x x xx x xx xx x x x xx x x x xxx x xxx x xx xx x x x x x xx xx xx x x xx x xxxxxxx x xxxxxxxxxxx xxxxxxxxxxx xxx x x x xxxx x xxxx xxxxxxxxxxxxxxxxxx xxxxx
  • 4.
    Addressing concerns aboutthe validity of PCD-based studies. . . Active areas of research: • Data quality • Data completeness • Confounding
  • 5.
    Addressing concerns aboutthe validity of PCD-based studies. . . Active areas of research: • Data quality • Data completeness • Confounding But addressing these assumes that the underlying definitions of clinical entities are valid!
  • 6.
    Deciding on acode list. . . Nicholson A, Ford E, Davies KA, Smith HE, Rait G, et al. (2013) Optimising Use of Electronic Health Records to Describe the Presentation of Rheumatoid Arthritis in Primary Care: A Strategy for Developing Code Lists. PLoS ONE 8(2): e54878. doi:10.1371/journal.pone.0054878
  • 7.
    Code list? WhatCode list? • The vast majority of PCD studies do not publish their codes • Currently no obligation to publish code lists by funding bodies, journals or databases • No centralised repository for clinical codes
  • 8.
    In 45 UKPCD case-control studies on diabetes:
  • 9.
    In 45 UKPCD case-control studies on diabetes: • Only 5 reported ANY clinical codes at all • Only 2 of these published codes in appendix • Only 1 provided full set of code lists Potential cases of diabetes were identified using predefined diabetes codes and prescriptions of oral anti-diabetics and insulin Cases with DM were included in the analyses if they had a first-time DM code recorded plus at least one prescription for an anti-diabetic drug Using medical READ codes, we identified all subjects in the GPRD who had a first- time diagnosis of …
  • 10.
    Code lists arenot available. . . So what?
  • 11.
    Codes not subjectto scrutiny or peer review • No way of knowing if a condition diagnosis is valid • Clinical decisions based on invalid condition definitions (Even though the analysis is rigorous)?
  • 12.
    No way toreplicate research “Non-reproducible single occurrences are of no significance to science.” —– Karl Popper (1959) ”an experiment is reproducable until another laboratory tries to repeat it.” — Alexander Kohn http://xkcd.com/242
  • 13.
    Difficulties in comparingstudies • Definitions change over time • GPs may change coding practice in response to regulations/incentives (e.g. QOF) • Different studies may use different markers (test scores, drugs, symptoms etc.)
  • 14.
    Have to buildnew code lists for known conditions from scratch
  • 15.
  • 16.
    ClinicalCodes.org ... an onlinerepository for primary care database researchers to upload and download clinical code definitions • Deposit code lists upon publication • Download historical code lists • Archive for all QOF business rules from 2004 • Metadata • Unique URI
  • 17.
    ClinicalCodes.org Codes can behosted for • Diagnoses • Drug exposures • Tests • Procedures • Outcomes Different coding systems • Read • ICD9/10 • SNOMED • ICPC
  • 18.
    ClinicalCodes.org users 1. PCDclinical researchers • Validaton of PCD studies • Building on previous code lists • Matching appropriate disease definitions in time
  • 19.
    ClinicalCodes.org users 2. Informaticians/ ‘meta-analysts’ • Study replications across databases • Tracking changes in disease definitions and doctors’ coding practice though time
  • 20.
    ClinicalCodes.org users 2. Informaticians/ ‘meta-analysts’ • Study replications across databases • Tracking changes in disease definitions and doctors’ coding practice though time • Research objects Bechhofer S, Buchan I, De Roure D, Missier P, Ainsworth J, Bhagat J, Couch P, Cruickshank D, Delderfield M, Dunlop I, Gamble M, Michaelides D, Owen S, Newman D, Sufi S, Goble C. (2013) Why linked data is not enough for scientists Future Generation Computer Systems 29(2): 599-611. http://dx.doi.org/10.1016/j.future.2011.08.004
  • 21.
    Why would Iwant to upload my codes? I’ve spent months building these code lists – I don’t want to give all my good ideas away to other groups for nothing! I am very busy and I don’t have time to upload my codes! I will not download codes so what’s the benefit to me? Publishing codes will expose the flaws in my coding strategy!
  • 22.
  • 23.
    Motivations • Upload issimple and painless • Faster and more consistent development of new code lists • Exposure and potential citations • Improvements in research quality • A way of finding out who is working in the same area
  • 24.
    Motivations • Uploading shouldbe required by – Journals – Funding bodies – Databases (ISAC) • Movement towards open data and greater transparency in publishing protocols • Research without accessible codelists is of questionable validity...
  • 25.