SlideShare a Scribd company logo
1 of 10
Download to read offline
Methods for Evaluation of Medical
Terminological Systems
A Literature Review and a Case Study
D. G. T. Arts1
, R. Cornet1
, E. de Jonge2
, N. F. de Keizer1
1
Academic Medical Center, Department of Medical Informatics, Amsterdam, The Netherlands
2
Academic Medical Center, Department of Intensive Care, Amsterdam, The Netherlands
Summary
Objectives: The usability of terminological systems
(TSs) strongly depends on the coverage and correctness
of their content. The objective of this study was to cre-
ate a literature overview of aspects related to the con-
tent of TSs and of methods for the evaluation of the
content of TSs. The extent to which these methods
overlap or complement each other is investigated.
Methods: We reviewed literature and composed defini-
tions for aspects of the evaluation of the content of TSs.
Of the methods described in literature three were se-
lected: 1) Concept matching in which two samples of
concepts representing a) documentation of reasons for
admission in daily care practice and b) aggregation
of patient groups for research, are looked up in the TS
in order to assess its coverage; 2) Formal algorithmic
evaluation in which reasoning on the formally repre-
sented content is used to detect inconsistencies; and
3) Expert review in which a random sample of concepts
are checked for incorrect and incomplete terms and re-
lations. These evaluation methods were applied in a
case study on the locally developed TS DICE (Diagnoses
for Intensive Care Evaluation).
Results: None of the applied methods covered all the
aspects of the content of a TS. The results of concept
matching differed for the two use cases (63% vs. 52%
perfect matches). Expert review revealed many more
errors and incompleteness than formal algorithmic
evaluation.
Conclusions: To evaluate the content of a TS, using a
combination of evaluation methods is preferable. Dif-
ferent representative samples, reflecting the uses of
TSs, lead to different results for concept matching. Ex-
pert review appears to be very valuable, but time con-
suming. Formal algorithmic evaluation has the poten-
tial to decrease the workload of human reviewers but
detects only logical inconsistencies. Further research is
required to exploit the potentials of formal algorithmic
evaluation.
Keywords
Terminological systems, evaluation, methods,
definitions
Methods Inf Med 2005; 44: 616–25
1. Introduction
Severaldevelopmentsinhealthcare,suchas
accountability of care and increased use of
electronic patient records, have led to an in-
creased need for accurate, detailed and
structured registration of medical data.
Many terminological systems (TSs) have
beenandarestillbeingdevelopedtosupport
this.ATS interrelates concepts of a particu-
lar domain and provides their terms and
codes [1]. The relations between the con-
cepts within a TS can be hierarchical (e.g.
Is-A) or non-hierarchical (e.g. has-loca-
tion). In addition some TSs hold (formal)
rules for the composition of new concepts
by combining existing concepts. Examples
of medical TSs are the International Clas-
sification of Diseases (ICD) [2], the Sys-
temized Nomenclature of Medicine
(SNOMED) [3, 4], and the NorthAmerican
Nursing Diagnosis Association (NANDA)
terminology [5]. By the direction of the
Dutch National Intensive Care Evaluation
foundation (NICE)a
our department is en-
gaged in a continuous effort to develop aTS
and corresponding software for the domain
of intensive care (IC). This system is called
Diagnoses for Intensive Care Evaluation,
DICE [6].
For the study described here we distin-
guish two types of use cases for termino-
logical systems. On the one hand TSs are
used by medical staff to document medical
data, e.g. patient characteristics or treat-
ment, in the medical record. On the other
hand TSs are used to select homogeneous
patient groups for research or management
purposes. A medical researcher or manager
selects from the TS those concepts that de-
fine a homogeneous patient group.After se-
lecting the appropriate concepts the re-
searcher/manager can identify patients that
fulfill the criteria, by searching their elec-
tronic records.
Several authors have specified required
characteristicsofaTS[7-9].In2000alist of
standard requirements for TS was devel-
oped and approved by the International Or-
ganization for Standardization (ISO) [10].
In this study we will focus on requirements
related to the content of a TS, i.e. the con-
cepts, their terms, and the relations between
the concepts. The content of a TS is of ut-
most importance for its acceptance. A phy-
sician needs to be able to be complete and
sufficiently accurate in depicting the care
process, and clinical researchers need to be
able to be complete in selecting specific pa-
tient groups at any desired level of aggre-
gation. To realize this all concepts, terms
and relations belonging to the domain of the
TS should be represented and should be cor-
rect. For example, we want sufficient terms
attachedtoaconcept,andwewant theterms
to be only the correct ones.
A number of methods to evaluate the
content of a TS have been described in lit-
erature.A literature study was performed to
gain insight into the several types of evalu-
ation methods. The diversity of the ter-
minology used in this context has incited us
to compose definitions for the most promi-
nent expressions that are used in this article.
In addition, we present three common
evaluation methods that focus on (but not
restrict to) the coverage and the correctness
of a TS’ content. These three methods have
been applied in a case study on theTS DICE
Methods Inf Med 5/2005 Received: April 15, 2004; accepted: March 15, 2005
a
The Dutch National Intensive Care Evaluation
(NICE) foundation: www.stichting-nice.nl
616
© 2005 Schattauer GmbH
[6].Theaim ofthecasestudywastoanalyze
the extent to which the results of the three
methods overlap or complement each other.
For this we compared the results produced
by each method.
2. Literature Study
As mentioned in the introduction, in this
study we focus on the evaluation of the con-
tent of aTS.To gain insight into methods for
evaluation of the content of TS that have
been applied by others we performed a re-
view of relevant Medline-indexed journal
articles by using (combinations of) the fol-
lowingkeywords:evaluation,validation,as-
sessment, audit, terminological system, ter-
minology, ontology, classification, thesau-
rus, nomenclature. In addition, articles were
retrieved from reference lists and personal
databases. An article was considered rel-
evant if it described the evaluation of the
quality of the content of aTS which was de-
veloped for a medical domain.Articles were
selected from the past ten years.
2.1 Definitions
Our literature study uncovered some incon-
sistencies in the terminology that is used in
this field. We therefore provide this article
with some definitions that have been ap-
plied in this study.
Firstofallwehaverestrictedourselvesto
the evaluation of TSs’content. By our defi-
nition, the content of a TS includes con-
cepts, the terms attached to these concepts
and the relations between these concepts.
‘Concepts’ can be defined as ‘units of
thought formed by the characteristics of ob-
jects’. Objects might be concrete things
such as the heart valve of patient X or ab-
stract things such as the pain of patient Y.
‘Terms’are used to designate a concept.The
‘relations’ between concepts can be hier-
archical (e.g. Is-A) or non-hierarchical [1].
A term that frequently appears in litera-
ture considering the evaluation ofTS is ‘do-
main completeness’. ‘Domain complete-
ness’ can be defined as the extent to which
the content of a TS covers the intended do-
main. Domain completeness according to
this definition would be hard to measure or
to quantify, because the continuous changes
in medical knowledge make it impossible to
define exactly what comprises a particular
medical domain. By lacking of this gold
standard, it is impossible to determine to
what extent aTS is complete in covering the
intended medical domain. Instead, we could
take a subset of concepts or terms represen-
tative for the intended domain and see to
what extent this subset is incorporated in a
particularTS.This way we will measure the
‘content coverage’.
Table 1 gives the definitions of ‘content
coverage’and of the ‘coverage’and the ‘cor-
rectness’ of the separate elements (i.e. con-
cepts, terms and relations) that comprise the
content of TS.
The definitions of ‘concept coverage’
and ‘term coverage’might look straightfor-
ward; however the measured coverage can
be highly influenced by choices made in the
evaluation process. For example one can
choose whether or not to consider the occur-
rence of specific concepts or terms in real
practice. Missing a frequently occurring
concept or term might be more severe than
missing those which are hardly ever used.
We therefore define ‘occurring coverage’
and ‘unique coverage’. For example in Fig-
ure 1 we see that the ‘occurring coverage’
is 80% (4/5), whereas the ‘unique coverage’
is 75% (3/4).
SomeTSsenablethecompositionofnew
conceptsbycombiningtwoormoreexisting
concepts. We call this feature “post-coor-
dination”. In case aTS enables post-coordi-
nation of concepts it is also important to
consider whether or not post-coordinated
concepts are taken into account when deter-
mining the coverage of the concepts. Many
concepts might not be present in a TS as
a pre-coordinated concept, but can be com-
posed by combining two or more concepts.
If only pre-coordinated concepts are con-
sidered in the evaluation, then the measured
concept coverage will be lower than when
post-coordinated concepts are also taken
into account.
Inthisstudywealsofocusonthecorrect-
ness of the content of TSs. Correctness can
only be measured for concepts, terms and
relations that are covered by theTS. Defini-
tionsofcorrectnessareprovidedinTable1.
Content coverage: The extent to which the content (e.g. concepts or terms) within a subset, representative for the domain of in-
terest, can be represented by the content of the terminological system.
Concept coverage: the extent to which the concepts within a subset, representative for the domain of interest,
can be represented by the concepts within the terminological system.
Occurring concept coverage: concept coverage using a subset in which each concept may occur more than once,
indicating the occurrence of that concept in practice.
Unique concept coverage: concept coverage using a subset in which each concept occurs at most once.
Post-coordinated concept coverage: the extent to which the concepts within a representative subset can be repre-
sented by the concepts (either pre-existing or created with use of composition rules) within the terminological system.
Term coverage: the extent to which the terms within a representative subset exist in the terminological systems' content,
provided that the terms relate to concepts that are present in the terminological system.
Occurring term coverage: term coverage using a subset in which each term may occur more than once, indicating the
occurrence of that term in practice.
Unique term coverage: concept coverage using a subset in which each term occurs at most once.
Relation coverage: the extent to which actual relations between concepts are represented in the TS’ content, provided that
they can be represented considering the semantic model of the TS.
Concept correctness: The extent to which the concepts that exist in the TS are non-redundant, non-vague and non-ambiguous.
Term correctness: The extent to which the terms that are attached to concepts in the TS are free of textual errors and attached to
the right concepts.
Relation correctness: The extent to which the relations that exist in the TS are consistent and in accordance with the factual re-
lations between concepts.
Table 1 Definitions for coverage and correctness of (the elements of) the content of a TS
Methods Inf Med 5/2005
617
Methods for Evaluation of Medical Terminological Systems
2.2 Evaluation Methods
Table 2 contains a list of aspects ofTSs that
have been evaluated by others. These as-
pectshavebeencategorizedaccordingtothe
aspects of the coverage and correctness of
TSs’ content, that were defined in Table 1.
Most of the evaluation studies described in
literature focus on the coverage of either the
concepts or the terms in a TS.
The coverage of concepts or terms is
often evaluated through ‘concept matching’
or ‘term matching’ [7, 11-13, 17-19, 22].
Matching implies that a representative
sample of concepts or terms is extracted
from the domain in which the TS is being
used, e.g. the diagnoses of patients at the
oncology department. The representative
sample can be randomly or non-randomly
chosen. The concepts from the sample are
then matched with those of aTS.The extent
towhichconceptsortermsinthesamplecan
be matched with concepts in the TS is
mostly presented by means of matching cat-
egories. For example Chute et al. [12] have
applied a scoring scale for the matching of
concepts from 0 to 2, where 0 = no match,
1 = fair match, 2 = complete match. A dif-
ferent categorization of matches was used
by Warnekar et al. [29]. According to this
Fig. 1 Example of a terminological system and a representative sample containing concepts that are being matched to the
terminological system to evaluate the concept coverage
Coverage
Concepts
Terms
Relations
Correctness
Concepts
Terms
Concept coverage
Consistent level of abstraction
Term coverage
Coverage of relationships
Availability of definitions
Completeness of definition
Non-redundancy
Lexical consistency
N
14
1
10
7
1
1
3
1
Relations
Appropriate terms
Accurate hierarchical relations
Consistent hier. relations
Sensible and appropriate hier.
pairs
Non-redundant hier. relations
Non-circular hier. relations
Consistent categorization
Relevant categorization
Unambiguous categorization
Accurate non-hierarchical re-
lations
Accurate definitions
3
1
6
2
1
3
2
1
1
1
1
Bakken
1994
[11]
X
X
Chute
1996
[12]
X
Campbell
1997
[7]
X
X
X
X
Humphreys
1997
[13]
X
X
Bodenreider
1998
[14]
X
X
X
X
Cimino
1998
[15]
X
X
X
Cimino
1999
[16]
X
X
X
Bowles
2000
[17]
X
X
X
X
X
Cimino
2001
[18]
X
X
X
Hardiker
2001
[19]
X
X
X
Schulz
2001
[20]
X
X
Bodenreider
2001
[21]
X
Bodenreider
2002
[22]
X
X
Bodenreider
2002
[23]
X
X
Cornet
2002
[24]
X
X
Moss
2003
[25]
X
Hardiker
2003
[26]
X
X
X
X
X
Haridiker
2003
[27]
X
Bodenreider
2003
[28]
X
X
Warnekar
2003
[29]
X
X
Wasserman
2003
[30]
X
X
X
Brown
2004
[31]
X
Penz
2004
[32]
X
Ceusters
2004
[33]
X
X
X
X
X
Table 2 Categorization of quality aspects of the content of TS, as evaluated by others
618
Arts et al.
Methods Inf Med 5/2005
categorization matches could be exact
matches, lexical or semantical matches, or
no-matches. Wasserman et al. [30] distin-
guished, apart from the exact matches, syn-
onyms and no-matches that required the ad-
dition to the hierarchy of a new ‘leaf’, a new
‘leaf with multiple stems’ or a ‘graft to an
existing branch’. The content coverage can
be represented, for example, by calculating
the percentage of perfect matches. To be
representative, the source of the sample of
concepts or terms should reflect the in-
tended use of the TS. For example if a TS
will be used by nurses for documentation of
nursing information, then the sample of
concepts could be well extracted from
existing nursing documentation in medical
records [11, 17]. While in most cases the
matching process is performed by humans,
Penz et al. [32] applied two automated map-
ping tools. Penz et al. were not able to assess
their overall value, due to high frequencies
of spelling errors and jargon in their sample.
In a study of Brown et al. [31] one of the
automated mapping tools, the SmartAccess
Vocabulary Server (SAVS), has proven to
be reliable.
Besides the ‘matching method’, other
methods have been applied to evaluate the
concept coverage or term coverage. In a
study by Bodenreider et al. [14] the system
being evaluated had already been in use for
some time.The measure of coverage of con-
ceptsinthesystemwasbasedonthenumber
of concepts that had to be added to the sys-
tem by theusersdueto under-representation
in the TS. In addition, they evaluated the
coverage of hierarchical and other relations
within the UMLS. They designed an algo-
rithm to automatically extract the UMLS
concepts that are related to procedures in a
particular domain. Starting with a few con-
cepts related to the domain, the algorithm
recursively selected all their subordinate
concepts. This navigation was based on the
relations between concepts. Lacking of re-
lations resulted in silence: A concept might
seem not to exist in the UMLS only because
it is not related to another concept. The
amount of relations missing was estimated
by comparing the concepts in the subset re-
trieved by the algorithm to the concepts that
are needed for the representation of the pro-
cedures within the domain of interest.
Whereas Bodenreider used the relations
between concepts to evaluate the content
coverage, these relations are more often
used for the evaluation of the correctness of
the content. Many TSs nowadays consist of
more than just a simple list of terms; hier-
archical and non-hierarchical relations exist
between the concepts. By looking at the re-
lations between concepts, inconsistent, am-
biguous or redundant concepts may be re-
vealed. For example, if two individual con-
cepts share the same meaning, they are ac-
tually redundant concepts. If ‘polyneu-
ropathy’ and ‘polyneuritis’ were each de-
fined as separate concepts one could find
that they share the same relations to other
concepts, and that they actually refer to the
same disease. In this case one of the con-
cepts ‘polyneuropathy’and ‘polyneuritis’is
redundant. In case of a hierarchical structur-
ing of the concepts, a concept might be in-
consistently classified if it has relations
which are in conflict with the relations of its
superordinate concept. For example, incon-
sistency might occur if a superordinate dis-
ease is defined to be caused by a bacterium,
whereas the subordinate disease is defined
to be caused by a virus. The inconsistency
here becomes apparent if ‘virus’ and ‘bac-
terium’ were explicitly made mutually ex-
clusive. Cimino used this kind of method to
detect ambiguities, redundancy and incon-
sistent hierarchical relations within the
UMLS [15]. In addition, Bodenreider et al.
[14] stated that hierarchical relations be-
tween concepts can be used for the evalu-
ation of the categorization of concepts.
Their evaluation was based on the idea that
concepts inherit properties from their super-
ordinate concept and thus a concept is sup-
posedtobelongatleasttothesamecategory
or categories as its superordinate concept.
Evaluation based on relations between con-
cepts has the potential to be automated or
semi-automated. For example a computer
algorithm could detect concepts that share
the exact same definitions or concepts that
were assigned to a number of semantic
types,ofwhichtwoaremutuallyexclusive.
To enable automated evaluation, the TS
content (especially the relations between
concepts) should be represented in a formal
way. Examples of formal representations
can be found in the SNOMED-CT [34] and
theGALENterminologies[35].SNOMED-
CT and GALEN both use a Description
Logic to represent their knowledge. In a
study of Cornet et al. [24] migration of con-
tent representation from frame-based to
Description-Logic-based has proven to be
valuable in determining redundancies in
concept definitions and in forcing the
knowledge modeler to be aware of am-
biguities. Schulz and Hahn [20] have ex-
pressed UMLS knowledge in Description
Logic. They provide evidence that embed-
ding the knowledge into a formal reasoning
framework is effective to identify inconsist-
encies. Bodenreider et al. [23] applied an-
other approach to the automated detection
of inconsistencies, by using the lexical
knowledge contained in a terminological
system. They assume that all terms are
composed of a modifier, such as ‘primary’
or ‘secondary’, and a context, a noun phrase
such as ‘adrenocortical insufficiency’. This
would result in the terms ‘primary adreno-
cortical insufficiency’and ‘secondary adre-
nocortical insufficiency’. They hypothesize
that terms of the form modifier1-context
and modifier2-context are co-hyponyms of
the term ‘context’. E.g. ‘primary adreno-
cortical insufficiency’and ‘secondary adre-
nocortical insufficiency’ are hyponyms of
‘adrenocortical insufficiency’. They base
their evaluation on the fact that in a consist-
ent terminology the terms modifier1-con-
textandmodifier2-contextshouldbe1)both
present and 2) in hierarchical relation with
the term ‘context’. The conclusion of this
study was that this method alone is not suf-
ficientforensuringtheconsistencyofaTS.
A completely different method to eval-
uate the correctness of relations between
concepts was applied by Campbell et al.
who evaluated the clinical utility of pairs
of hierarchically related concepts within
three medical terminological systems
(SNOMED, READ and UMLS) [7]. Six
clinician-informatics specialists manually
reviewed random samples of hierarchical
pairs. They used a five-point Likert scale
(1 = extremely dissatisfied with pairing,
3 = neutral, 5 = extremely satisfied with
pairing) to rate the clinical utility of each
pair. Review by domain experts was also
applied in a study by Hardiker [26].
Methods Inf Med 5/2005
619
Methods for Evaluation of Medical Terminological Systems
In summary, we distinguish four eval-
uation strategies for the content of TSs;
1) concepts matching, 2) evaluation based
onrelationsbetweenconcepts,3)evaluation
by domain experts, and 4) evaluation based
on lexical knowledge.
Based on this literature review on
methods for evaluation of TS we selected
three methods for the case study, which will
be described below. The three methods re-
flected the first three of the above men-
tioned evaluation strategies. The methods
were chosen because previous studies deem
them promising and because they were ap-
plicabletoourcasestudywiththeDICETS.
3. Case Study
3.1 Background
A study by de Keizer et al. in 1998 has
shown that none of the contemporary TSs
met the criteria of a TS for Dutch intensive
care [6].This has been the motivation to de-
velop a new TS, Diagnoses for Intensive
Care Evaluation (DICE). The TS DICE
comprises reasons for admission to the In-
tensive Care Unit (ICU), and some of their
characteristics, such as the anatomical lo-
calization, the dysfunction and the etiology.
The DICE TS contains 2373 concepts, of
which 1456 are diagnoses that form reasons
for admission to the ICU. Other concepts in-
clude for example anatomical locations and
causes of disease. There are 50 relation
types. Thirteen of these, for example
“has_anatomical_localization”, may be
used for any of the diagnoses. The other 37
relation types are attributes which are spe-
cific for certain diagnoses, for example the
chronicity (e.g. acute, chronic) of organ
dysfunction. Currently a total of 10,425 re-
lations between concepts have been de-
fined. DICE can be incorporated into Pa-
tient Data Management Systems to facili-
tate documentation by physicians, and it is
intended to be used for patient selection and
aggregation for medical research and man-
agement overviews. Two domain experts,
i.e. intensive care physicians, and two medi-
cal informaticians started seven years ago
with a rather simple hierarchy of reasons for
ICU admission achieved from the ICNARC
Coding Method [36]. Due to the complexity
of concepts in the domain, the need for a
separation of concepts and terms, and the
need for a structure to enable aggregation of
homogeneous patient groups we chose to
specify the concepts and their character-
istics more formally.The DICE content was
therefore converged to a frame-based struc-
ture. In the development process of DICE
we are currently at the stage where we need
to evaluate to what extent the current con-
tent of DICE meets the requirements, in
terms of coverage and correctness, for the
intended use of the system.
3.2 Methods
In the case study we will apply three
methods: concept matching, formal algo-
rithmic evaluation and expert review, to
evaluate the coverage and the correctness of
the DICE content and to analyze the extent
to which these methods overlap or comple-
ment each other. We will compare the over-
all coverage and correctness measured by
the three methods. For methods 2 and 3 the
individual missing or incorrect concepts,
termsandrelationsrevealedbyeachmethod
are compared. The methods used are de-
scribed below.
3.2.1 Method 1: Concept Matching
In this study concept matching was carried
out twice with different representative
samples of concepts that were matched to
the TS. The samples differed in the sources
from which they were retrieved. The two
sources reflected the two distinct purposes
of the system, i.e. 1A) the documentation of
and communication about patients’ rea-
son(s) for admission and 1B) the selection
of homogeneous (with respect to diagnosis)
patient groups for clinical research and
management. Concepts from these two rep-
resentative samples were matched to the
content of DICE. Concepts found in DICE
could be 1) a perfect match, 2) related (e.g.
mitral valve instead of tricuspid valve),
3) too narrow in meaning (e.g. subarach-
noidal heamorrhage instead of intracranial
heamorrhage), 4) too general in meaning
(e.g. polyneuropathy instead of infectious
polyneuropathy), or 5) a concept could not
be coded at all. We applied this categori-
zation because in case of a suboptimal
match it enabled us to be specific about why
a match was not a ‘perfect match’. The dis-
tribution of the concepts among the match-
ing categories was calculated when using
all diagnoses occurring in the sets and when
using only the unique diagnoses. DICE
offers the users the opportunity to compose
new (post-coordinated) concepts out of two
or more consisting (pre-coordinated) con-
cepts. In this study the diagnoses within
DICE that were matched to the diagnoses
within the samples could also be post-coor-
dinated diagnoses.
Method 1A: Evaluation for Documentation
of Reasons for Admission
DICE was used at the intensive care depart-
ment of the Academic Medical Center in
Amsterdam during March 2001. Attending
intensive care physicians used the system in
real practice to code actual patients’reasons
for admission. The reasons for admission
that the physicians wanted to record in a pa-
tient’s medical record comprised the repre-
sentative sample of concepts. A physician
assigned each concept within the sample to
a matching category to express the extent to
whichitwasrepresentedinDICE.Incaseof
a non-perfect match the physician entered
the actual diagnosis in free text. This en-
abled checking the correct assignment of
concepts to the matching categories.
Method 1B: Evaluation for Aggregation
of Patient Groups
We collected all diagnoses that formed (a
part of) the in- and exclusion criteria of
clinical studies that appeared in two impor-
tant intensive care journals (Intensive Care
Medicine and Critical Care Medicine) be-
tween January 1, 2001 and July 1, 2001.
These diagnoses comprised the represen-
tative sample of concepts that was matched
to the TS DICE. The concepts within the
sample were assigned to one of the match-
ingcategoriesbytwooftheauthors(DAand
EJ), by means of consensus.
620
Arts et al.
Methods Inf Med 5/2005
3.2.2 Method 2: Formal Algorithmic
Evaluation
As an increasing number of medical ter-
minological systems is based on formal rep-
resentation, it is important to understand the
potential of readily available reasoning al-
gorithms exploiting the formal represen-
tation. Such algorithms can be instrumental
for evaluation of the content of terminologi-
calsystems.Oneexampleofdeployingsuch
algorithms is the constraint checking engine
based on Protégé Axiom Language (PAL),
which has been applied to the Gene Ontol-
ogy [37]. Whereas in this example a frame-
based representation is used, we have used
the Description Logic (DL) formalism [38].
The publicly available reasoner RACER
[39] was used to perform satisfiability test-
ing, which provides a means for detecting
mutuallyconflictingconceptdefinitions.As
the content of DICE has a frame-based rep-
resentation, it was migrated to a DL-based
representation, according to the process de-
scribed in [40]. In this migration process,
assumptions are made on the semantics of
definitions, such as for example disjoint-
ness of sibling concepts.
For this evaluation we randomly
extracted a sample of 80 pre-coordinated
diagnoses in DICE. The reasoning process
revealed concepts that had inconsistent
definitions, which indicated the presence of
incorrect (hierarchical or non-hierarchical)
relations. For example, if the superordinate
concept infectious polyneuropathy (see
Table 3) was defined to be caused by a virus
and the subordinate leprosy polyneuropathy
was defined to be caused by the Myco-
bacterium leprae, while it was known that
Mycobacterium leprae is not a virus, then
the subordinate concept would be identified
as inconsistent. The inconsistency here
could have been caused by the fact that the
etiology of the superordinate concept
should also include bacterium instead of
virus alone, or by the fact that leprosy poly-
neuropathy should not have been classified
as a subordinate concept of infectious poly-
neuropathy.
3.2.3 Method 3: Expert Review
We printed on paper forms the terms and
(hierarchical and non-hierarchical) rela-
tions belonging to each of the 80 randomly
selected concepts that were also used in the
formal algorithmic evaluation (Fig. 2). Six
domain experts, all experienced intensive care
physicians, manually reviewed the terms
and relations belonging to the concepts. If
they found a missing or incorrect term or re-
lation they wrote this on the paper forms.
One of the authors (DA) collected and ana-
lyzedthecommentsofthedomainexperts.
3.3 Results
3.3.1 Concept Matching
During the study to evaluate the coverage of
DICE for documentation of reasons for ad-
mission(1A)10ICUphysiciansregistereda
total of 164 diagnoses, of which 107 were
unique. For the concept matching to evalu-
ate the coverage of DICE for aggregation of
patient groups (1B) we retrieved 218 diag-
noses, of which 187 were unique. The over-
lap of the two samples consisted of eight
unique diagnoses. The distribution of con-
cepts among the matching categories is dis-
played in Figure 3.
Concept matching for documentation of
diagnoses in daily care practice (1A) re-
sultedin63%(n=67)perfectmatcheswhen
Infectious_polyneuropathy ⊆ Polyneuropathy ∩ ∃ hascause Virus ∩ ∀ hascause Virus
Leprosy_polyneuropathy ⊆ Infectious_polyneuropathy ∩ ∃ hascause Mycobacterium_Leprae ∩ ∀ hascause Mycobacterium_Leprae
Mycobacterium_Leprae ⊆ Bacterium
Disjoint (Virus, Bacterium)
Table 3
A fictitious example of
(inconsistent) concept
definitions in a DL-based
terminological system
Fig. 2 Example of a paper form for expert review
Methods Inf Med 5/2005
621
Methods for Evaluation of Medical Terminological Systems
only uniquely occurring diagnoses were
considered (unique concept coverage) and
74% (n = 121) perfect matches when each
single occurrence of a diagnosis in the
sample was considered (occurring concept
coverage). Concept matching for aggre-
gation of patient groups (1B) resulted in
lower frequencies of perfect matches (52%
(n = 97) unique, 51% (n = 111) occurring).
The frequency of ‘too general’ matches
was higher for the concept matching for ag-
gregation of patient groups for clinical re-
search and management (1B) (25% unique,
24% occurring) than for documentation of
diagnoses (1A) (14% unique, 10% occur-
ring). The structure of DICE enables the
composition of new diagnoses by specify-
ing their non-hierarchical relations (post-
coordination). ‘Too general’ matches indi-
cated that one or more non-hierarchical re-
lations,thatarenecessarytoenablethepost-
coordination of that specific diagnosis,
were missing. DICE enables users to search
for a diagnosis based on its characteristics
(non-hierarchical relations). For example a
user might search DICE for a diagnosis that
isaninfectionthatislocatedinthelungsand
retrieve the diagnosis Pneumonia. In one
case the physicians appeared to be unable to
find a diagnosis based on its characteristics.
This indicated that the characteristics that
thephysicianattachedtothisdiagnosiswere
not consistent with those in DICE.The con-
cept was available in DICE, but some char-
acteristics were missing. The matching cat-
egory assigned by the physician to this
diagnosis was incorrect, i.e. ‘no match’
whereas it actually had to be ‘too general’.
The correct category, ‘too general’, was
used in the analysis of the results.
3.3.2 Formal Algorithmic Evaluation and
Expert Review
The formal algorithmic evaluation of the 80
concepts randomly selected from DICE re-
vealed 28 concepts with inconsistent defini-
tions.Teninconsistencieswereduetoerron-
eous assumptions made during the mi-
grationofDICEfromtheframe-basedtothe
DL-based representation.The remaining 18
inconsistent concepts were caused by 32 er-
rors consisting of ten missing relations,
ten incorrect relations and twelve relations
that were specified as ‘defining’ instead of
‘qualifying’ relations. An example of the
latter is ‘pericarditis’, which was defined as
always being an infection (defining re-
lation), whereas in fact this is not always the
case, i.e. ‘pericarditis’ can be an infection
(qualifying relation). Pericarditis is only an
infection in case a bacterium or a virus
caused the inflammation.
For the 80 selected concepts the six do-
main experts found 397 missing relations,
123 incorrect relations, 70 relations that
were specified as ‘defining’ in stead of
‘qualifying’ relations, five missing terms,
15 incorrect terms and two diagnoses that
should be deleted from the TS content. Of
the total of 612 unique discrepancies 173
(28%) were found by two or more domain
expertsand83(14%) werefoundbythreeor
more domain experts. Twenty-four errors
werefoundbothwiththeformalalgorithmic
evaluation and by the expert review.Table 4
displays the numbers of the different types
of errors and omissions were discovered by
each of the three methods applied in this
study.
4. Discussion
Over the past 15 years much has been
written about the content of TSs. Require-
ments for the content ofTSs and evaluations
of the quality of the content of TSs have
beendescribed.Thispaperprovidesanover-
view of what aspects determine the quality
of a TS’ content and how these aspects can
Fig.3 Distributionofconceptsamongthematchingcategories.ThematchingcategoriesrepresenttheextenttowhichconceptsintheTSDICEmatchedconceptswithinrepresentativesamples
for (1A) documentation of diagnoses in daily carepractice and(1B) aggregation ofpatient groups for clinical research andmanagement,includingpost-coordination, split-up into unique and
occurring concept matching.
622
Arts et al.
Methods Inf Med 5/2005
Quality aspects of TS’ content
Coverage Concepts
Terms
Relations
Correctness Concepts
Terms
Relations
Total number of errors found
Concept matching
1A
(164 concepts)
24
9
19
-
-
-
52
1B
(218 concepts)
49
4
58
-
-
-
111
Formal algorithmic
evaluation
(80 concepts)
-
-
10
-
-
22
32
Expert review
(80 concepts)
5
5
392
2
15
193
612
Table 4 Numbers of missing or incorrect concepts, terms or relations discovered by the three methods
be evaluated. The aspects most often evalu-
ated were the coverage of terms and of con-
cepts [7, 11-14, 16, 17, 22, 25, 26, 29-32].
Lately the coverage and the correctness of
relations between concepts have received
increasing attention, especially by Boden-
reider et al. [14, 21-23, 28] and Hardiker
[19, 26, 27].
The case study described in this paper
providesacomparisonofthreemethodsthat
can be applied to evaluate the coverage and/
or the correctness of the content ofTSs.The
first method consisted of two ‘concept
matching’studies which only differed in the
origin of the representative samples of con-
cepts to be matched. The overlap of the two
samples was relatively small. The matching
categories assigned by the physicians for
evaluation of DICE for documentation of
diagnoses (1A) were checked by the same
two people that assigned the categories for
the evaluation of DICE for aggregation of
patient groups (1B). This makes the as-
signed matching categories comparable be-
tween the two studies. Only once the as-
signed matching category was found to be
incorrect. This indicates that the method,
‘concept matching’by physicians, produces
reliable results.
Differencesintheresultsofthetwo‘con-
cept matching’studies especially concerned
the percentage of perfect matches and the
percentageofconceptsthatwerefoundtobe
too general in meaning. Hales et al. [41]
have asserted that the quality of a TS is de-
fined relative to its intended use, which is a
major barrier to the evaluation of TSs. The
intended use of a TS specifically plays an
important role in the evaluation of its con-
tent.The results of this study endorse the as-
sertions of Hales et al. The quality of the
content of DICE did appear to be relative to
the purpose of the system and it appeared
that in the case of DICE we couldn’t rely on
a single measure (1A or 1B) to get a com-
plete overview on the coverage of the DICE
content.
Wheninterpretingtheresultsofthe‘con-
cept matching’ evaluation as applied here
one needs to keep in mind that it concerns
only a sample of concepts. What we
measure by concept matching is the ‘con-
cept coverage’ which is merely an approxi-
mation of the completeness of all con-
cepts that belong to the domain of interest
(‘domain completeness’). In view of the
methods for retrieval of the representative
samples of concepts (or terms) there is a
chance that the sample does not contain
concepts that only rarely occur in practice.
However, the question of whether rarely oc-
curring concepts are represented in a TS
might not be as important as whether con-
cepts that frequently occur are represented.
The frequencies of occurrence of concepts
in practice can be taken into account by
using each single occurrence of a concept
within the representative sample for (occur-
ring) concept matching. The increase in
concept coverage when measuring the oc-
curring instead of the unique concept cover-
age (Fig. 4) indicates that, in the case of
DICE, the concepts that were not repre-
sentedweremostlythenotfrequentlyoccur-
ring concepts.
There appeared to be large differences
between the numbers of errors or omissions
found by the formal algorithmic evaluation
and those found by the domain experts.The
physicians identified many more errors and
omissions than the formal algorithmic
evaluation. The difference stems from the
fact that the formal algorithmic evaluation
only revealed logically incorrect defini-
tions, whereas the physicians also identified
wrong terms and the logically correct, but
clinically incorrect definitions. For exam-
ple, if the definition of encephalopathy
statedthatitalwaysinvolvesastateofcoma,
then the formal algorithmic evaluation
would render this correct. The physician
however would not agree with this defi-
nition. Instead (s)he would rather say that
a patient suffering from encephalopathy
could be in a state of coma.
The formal algorithmic evaluation, as it
was performed here, has some other draw-
backs.ThemigrationofDICEfromaframe-
basedtoaDL-basedrepresentationrequired
a number of assumptions that had to be
made. In our case these assumptions made
some concepts appear inconsistent, whereas
they actually were not. Another shortcom-
ing was that by using the formal algorithmic
evaluation as presented here the inconsist-
ent definitions could only be identified.The
pinpointing of the actual causes of the in-
consistencies had to be done manually. We
are currently working on ways to automate
this identification process [42, 43].
The major drawback of the expert review
was that it took the physicians much time to
look carefully at all the terms and relations
belonging to a concept. Similarly, the analy-
sis of the comments generated by the phy-
sicians was a very time-consuming effort.A
large number of the comments were given
byonlyonereviewer.Whereasinmostcases
only comments on which reviewers have
reachedconsensuswill beprocessed,alarge
part of the comments will not be used for
updating theTS. It is arguable how many re-
viewershavetobeinvolvedintheconsensus
process and how many reviewers have to
agree on a comment before it is processed.
Automated detection of inconsistent con-
Methods Inf Med 5/2005
623
Methods for Evaluation of Medical Terminological Systems
cepts, such as the formal algorithmic evalu-
ation as it was applied in this study, does
have the potential to limit and focus the ef-
fort of domain experts in the reviewing
process. However, this requires that these
methods are further explored. We will con-
sider this in our future research.To decrease
the efforts for physicians to review the con-
tent of a TS we have started the implemen-
tation of a system for ‘internet-based ter-
minological knowledge reviewing’, called
KEBoRT (Knowledge Editorial Board on-
lineReviewingTool),whichwill beusedfor
reviewing the entire DICE content [44].
A shortcoming of the case study is that
the evaluation methods were only applied
on the TS DICE. There is a chance that the
large number of errors and omission identi-
fied by the expert reviewers was due to the
fact that the current content of DICE is not
based on expert consensus. Instead, the de-
velopers of DICE consulted two domain ex-
perts when building the TS. It might be that
expert review as it was applied in this case
study would produce fewer comments if all
concepts, terms and relations had been ap-
proved, by means of consensus between a
larger number of domain experts, before
they were included in theTS.This should be
considered when generalizing the con-
clusions of this case study, regarding the
number of errors found by expert review.
Looking at the results produced by the
three methods we see that the ‘concept
matching’methods, that were originally de-
signed to evaluate the coverage of concepts,
also revealed some missing terms and (non-
hierarchical) relations.The formal algorith-
mic evaluation revealed only missing and
incorrect relations. The expert review re-
vealed mainly missing and incorrect re-
lations, but also a small number of missing
and incorrect concepts and terms.
5. Conclusion
Evaluation studies of the content of TS
mostly concern ‘term matching’or ‘concept
matching’. More sophisticated methods are
being explored. Independent of the method
used, it remains important to define exactly
what is being measured. The definitions
provided in this article could be a starting
point in this.
Based on the results of the case study it
seems that expert review is most complete
in evaluating the quality of the content of
TSs. Expert review produces results for all
aspects that determine the quality of the
content. However the expert review method
has some major drawbacks, of which the
most important is the fact that it is very
time-consuming. In order to get a good
overview of the quality of the content of
a TS, it is preferable to use a combination
of evaluation methods.The ‘concept match-
ing’ method seems to be most useful to de-
termine the coverage of the concepts and
terms in the TS. However, it is important to
consider the source that was used to retrieve
the sample of concepts that are being
matched to theTS.The intended purpose of
the TS should determine the source of the
sample. Different sources can lead to differ-
ent results regarding the quality of the con-
tent of a TS. In addition a clear description
of the applied concept matching method is
necessary. Formal algorithmic evaluation
hasthepotentialtodecreasetheworkloadof
human reviewers, but further research is
required to explore these potentials. From
this study it became clear that each method
has its strengths and weaknesses. Therefore
the three methods should be used in combi-
nation with each other.
Acknowledgments
We thank the members of the board of the Dutch
National Intensive Care Evaluation (NICE) foun-
dation and Margreeth Vroom, the head of the depart-
ment of intensive care of the Academic Medical
Center in Amsterdam, for their cooperation in this
study.We thank the Dutch ministry of Health,Welfare
and Sports for their financial support. The prelimi-
nary results of this study have been presented in a re-
duced form in theAMIA 2003 and the Medinfo 2004
proceedings [45, 46].
References
1. De Keizer N, Abu-Hanna A, Zwetsloot-Schonk J.
Understanding terminological systems I: ter-
minology and typology. Meth Inform Med 2000;
39 (1): 16-21.
2. World Health Organization. International Clas-
sification of Diseases, manual of the International
Statistical Classification of diseases, injuries and
causes of death: 9th revision; 1977.
3. Cad R, Robboy S. Progress in medical in-
formation management. Systemized nomencla-
ture of medicine (SNOMED). JAMA 1980; 243:
756-62.
4. Cote R, Rothwell D, Palotay J, Beckett R, Brochu
L. The systemized nomenclature of human and
veterinary medicine: SNOMED International.
College of American Pathologists, 1993.
5. North American Nursing Diagnosis Association.
NANDA nursing diagnoses: Definitions and
classifications, 1999-2000. Philadelphia, PA:
NANDA, 1999.
6. De Keizer N,Abu-HannaA, Cornet R, Zwetsloot-
Schonk J, Stoutenbeek C. Analysis and design of
an ontology for intensive care diagnoses.Methods
Inf Med 1999; 38 (2): 102-12.
7. Campbell J, Carpenter P, Sneiderman C, Cohn S,
Chute C, Warren J. Phase II evaluation of clinical
coding schemes: Completeness, taxonomy, map-
ping, definitions, and clarity. J Am Med Inform
Assoc 1997; 4 (3): 238-51.
8. Chute C, Cohn S, Campbell J. A framework for
comprehensive health terminology systems in the
United States. Development guidelines, criteria
forselection,andpublicpolicyimplications.JAm
Med Inform Assoc 1998; 5 (6): 503-10.
9. Cimino J. Desiderata for controlled medical vo-
cabularies in the twenty-first century. Meth In-
form Med 2001; 37 (4-5): 394-403.
10. ISO/TC215 WG 3. Standard Specification for
Quality Indicators for Controlled Health Vocabu-
laries; 2000 July. Report No.: TS17117.
11. Bakken Henry S, Holzemer W, Reilly C, Camp-
bell K. Terms used by nurses to describe patient
problems: Can SNOMED III represent nursing
concepts in the patient record? J Am Med Inform
Assoc 1994; 1 (1): 61-74.
12. ChuteC,CohnS,CampbellK,OliverD,Campbell
J.The content coverage of clinical classifications.
J Am Med Inform Assoc 1996; 3 (3): 224-33.
13. HumphreysB,McCrayA,ChehM.Evaluatingthe
coverage of controlled health data terminologies:
Report on the results of the NLM/AHCPR large
scale vocabulary test. J Am Med Inform Assoc
1997; 4 (6): 484-500.
14. Bodenreider O, BurgunA, Botti G, Fieschi M, Le
Beux P, Kohler F. Evaluation of the Unified Medi-
cal Language System as a medical knowledge
source. J Am Med Inform Assoc 1998; 5 (1):
76-87.
15. Cimino J. Auditing the unified medical language
system with semantic methods. JAm Med Inform
Assoc 1998; 5 (1): 41-51.
16. Cimino J, McNamara T, Meredith T, Broverman
C, Eckert K, Moore M, et al. Evaluation of a pro-
posed method for representing drug terminology.
In:AMIA1999:ProceedingsoftheAMIAAnnual
Symposium; 1999; Washington, DC. Philadel-
phia: Hanley & Belfus; 1999. pp 47-51.
17. Bowles K. Application of the Omaha system in
acute care. Res Nurs Health 2000; 23: 93-105.
18. Cimino J, Patel V, Kushniruk A. Studying the
human-computer-terminology interface. J Am
Med Inform Assoc 2001; 8 (2): 163-73.
19. Hardiker N, Rector A. Structural validation of
nursing terminologies. J Am Med Inform Assoc
2001; 8 (3): 212-21.
624
Arts et al.
Methods Inf Med 5/2005
Correspondence to:
D. G. T. Arts
Academic Medical Center
Department of Medical Informatics
P.O. Box 22700
1100 DE Amsterdam
The Netherlands
E-mail: arts@oebig.at
20. Schulz S, Hahn U. Medical knowledge reengi-
neering – converting major portions of the UMLS
into a terminological knowledge base. Int J Med
Inf 2001; 64: 207-21.
21. BodenreiderO.Circularhierarchicalrelationships
intheUMLS:etiology,diagnosis,treatment,com-
plications and prevention. In: Bakken S, editor.
AMIA 2001: Proceedings of the AMIA Annual
Symposium; 2001; Washington, USA. Philadel-
phia: Hanley & Belfus; 2001. pp 57-61.
22. Bodenreider O, Mitchell J, McCrayA. Evaluation
of the UMLS as a terminology and knowledge re-
source for biomedical informatics. In: Kohane I,
editor.AMIA2002:ProceedingsoftheAMIAAn-
nual Symposium; 2002; San Antonio, USA. Phil-
adelphia: Hanley & Belfus; 2002. pp 61–65.
23. Bodenreider O, Burgun A, Rindflesch T. Assess-
ing the consistency of a biomedical terminology
through lexical knowledge. Int J Med Inf 2002;
67: 85-95.
24. Cornet R, Abu-Hanna A. Evaluation of a frame-
based ontology. A formalization-orriented ap-
proach. In: Engelbrecht R, Surján G, McNair P,
editors. MIE2002: Proceedings of Medical Info-
bahn Europe; 2002; Budapest. Amsterdam: IOS
Press; 2002. pp 488-93.
25. Moss J, CoenenA, Etta Mills M. Evaluation of the
draft international standard for reference ter-
minology model for nursing actions. J Biomed In-
form 2003; 36: 271-8.
26. Hardiker N. Logical ontology for mediating be-
tween nursing intervention terminology systems.
Methods Inf Med 2003; 42: 265-70.
27. Hardiker N. Determining sources for formal nurs-
ing terminology systems. J Biomed Inform 2003;
36: 279-86.
28. Bodenreider O. Strength in numbers: Exploring
redundancy in hierarchical relations across bio-
medicalterminologies.In:MusenM,FriedmanC,
Teich J, editors. AMIA 2003: Proceedings of the
AMIA annual symposium; 2003; Washington,
USA. Philadelphia: Hanley & Belfus; 2003. pp
101-5.
29. Warnekar P, Carter J. HIV terms coverage by a
commercial nomenclature. In: Musen M, Fried-
man C, Teich J, editors. AMIA2003: Proceedings
oftheAMIAAnnualSymposium;2003;Washing-
ton, DC. Philadelphia: Hanley & Belfus; 2003.
p 1046.
30. Wasserman H, Wang J. An applied evaluation of
SNOMED CT as a clinical vocabulary for the
computerized diagnosis and problem list. In:
Musen M, Friedman C, Teich J, editors.
AMIA2003: Proceedings of the AMIA Annual
Symposium; 2003; Washington, DC. Philadel-
phia: Hanley & Belfus; 2003. pp 699-703.
31. Brown S,Elkin P,Rosenbloom S,Husser C,Bauer
B, Lincoln M, et al. VA national drug reference
terminology: A cross-institutional content cover-
age study. In: Fieschi M, Coiera E, Li J, editors.
Medinfo 2004: Proceedings of the 11th world
congress on medical informatics; 2004; San
Francisco, USA. Amsterdam: IOS Press; 2004.
pp 477-81.
32. Penz J, Brown S, Carter J, Elkin P, NguyenV, Sims
S. Evaluation of SNOMED coverage of Veterans
HealthAdministration terms. In: Fieschi M, Coie-
ra E, Li J, editors. Medinfo 2004: Proceedings of
the 11th world congress on medical informatics;
2004; San Francisco, CA.Amsterdam: IOS Press;
2004. pp 540-4.
33. Ceusters W, Smith B, Kumar A, Dhaen C. Ontol-
ogy-based error detection in SNOMED-CT. In:
Fieschi M, Coiera E, Li J, editors. Medinfo 2004:
Proceedings of the 11th world congress on medi-
cal informatics; 2004; San Fransisco, USA. Am-
sterdam: IOS Press; 2004. pp 482-6.
34. Rothwell D. SNOMED-based knowledge repre-
sentation. Meth Inform Med 1995; 34: 209-13.
35. Rector A, Nowlan W. The GALEN project. Com-
put Methods Programs Biomed 1994; 45 (1-2):
75-8.
36. Young J, Goldfrad C, Rowan K. Development
and testing of a hierarchical method to code the
reason for admission to intensive care units: the
ICNARC coding method. Br J Anaesth 2001; 87
(4): 543-8.
37. Yeh I, Karp P, Noy N, Altman R. Knowledge ac-
quisition, consistency checking and concurrency
control for Gene Ontology (GO). Bioinformatics
2003; 19 (2): 241-8.
38. Baader F, Calvanese D, McGuinness DL, Nardi D,
Patel-Schneider PF.The Description Logic Hand-
book: Theory, Implementation, and Applications.
Cambridge: University Press; 2003.
39. Haarslev V, Möller R. RACER System Descrip-
tion.In:MassacciF,editor.IJCAR2001:Proceed-
ings of the International Joint Conference for
Automated Reasoning; 2001; Sienna. Springer-
Verlag; 2001. pp 701-6.
40. Cornet R, Abu-Hanna A. Using Description
Logics for Managing Medical Terminologies. In:
Barahona P, editor. 9th Conference on Artificial
Intelligence in Medicine in Europe, AIME; 2003.
Protaras, Cyprus: Springer; 2003. pp 61-70.
41. Hales J, Schoeffler K. Barriers to evaluation of
clinical vocabularies. In: Cesnik B, editor. Medin-
fo 1998: Proceedings of the 5th world congress on
medical informatics; 1998, Seoul. Amsterdam:
IOS Press; 1998. pp 680-4.
42. Schlobach S, Cornet R. Logical Support for Ter-
minologicalModeling.In:FieschiM,CoieraE,Li
J, editors. Medinfo 2004: Proceedings of the 11th
worldcongressonmedicalinformatics;2004;San
Francisco, CA. Amsterdam: IOS Press; 2004. pp
439-43.
43. Schlobach S, Cornet R. Non-Standard Reasoning
Services for the Debugging of Description Logic
Terminologies. In: Cohn A, editor. IJCAI2003:
Proceedings of the International Joint Conference
on Artificial Intelligence; 2003; Acapulco, Mexi-
co. San Fransisco: Morgan Kaufmann Publishers;
2003.
44. Prins A, Arts D, Cornet R, de Keizer N. Internet-
based terminological knowledge maintenance. In:
Fieschi M, Coiera E, Li J, editors. Medinfo 2004:
Proceedings of the 11th world congress on
medical informatics; 2004; San Francisco, CA.
Amsterdam: IOS Press; 2004. p 1820.
45. Arts D, Cornet R, De Jonge E, De Keizer N. Com-
parison of methods for evaluation of medical ter-
minological systems. In: Musen M, Friedman C,
Teich J, editors. AMIA2003: Proceedings of the
AMIA Annual Symposium; 2003; Washington,
DC.Philadelphia:Hanley&Belfus;2003.p779.
46. Arts D, De Keizer N, De Jonge E, Cornet R. Com-
parisonofmethodsforevaluationofamedicalter-
minological system. In: Fieschi M, Coiera E, Li J,
editors. Medinfo 2004: Proceedings of the 11th
world congress on medical informatics; 2004;
San Francisco, CA.Amsterdam: IOS Press; 2004.
pp 467-71.
Methods Inf Med 5/2005
625
Methods for Evaluation of Medical Terminological Systems

More Related Content

Similar to A Literature Review and a Case Study.pdf

IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
Definitions of occupational balance and their coverage by instruments
Definitions of occupational balance and their  coverage by instrumentsDefinitions of occupational balance and their  coverage by instruments
Definitions of occupational balance and their coverage by instrumentsGrupo OT5
 
CHAPTER 10 MIXED METHODS PROCEDURESHow would you write a mixed m
CHAPTER 10 MIXED METHODS PROCEDURESHow would you write a mixed mCHAPTER 10 MIXED METHODS PROCEDURESHow would you write a mixed m
CHAPTER 10 MIXED METHODS PROCEDURESHow would you write a mixed mEstelaJeffery653
 
Automated Extraction Of Reported Statistical Analyses Towards A Logical Repr...
Automated Extraction Of Reported Statistical Analyses  Towards A Logical Repr...Automated Extraction Of Reported Statistical Analyses  Towards A Logical Repr...
Automated Extraction Of Reported Statistical Analyses Towards A Logical Repr...Nat Rice
 
Paper 2 Summarize, Quote, Critique, 750-1000 words)· Choos.docx
Paper 2  Summarize, Quote, Critique, 750-1000 words)· Choos.docxPaper 2  Summarize, Quote, Critique, 750-1000 words)· Choos.docx
Paper 2 Summarize, Quote, Critique, 750-1000 words)· Choos.docxbunyansaturnina
 
How to conduct_a_systematic_or_evidence_review
How to conduct_a_systematic_or_evidence_reviewHow to conduct_a_systematic_or_evidence_review
How to conduct_a_systematic_or_evidence_reviewEaglefly Fly
 
Biomedical indexing and retrieval system based on language modeling approach
Biomedical indexing and retrieval system based on language modeling approachBiomedical indexing and retrieval system based on language modeling approach
Biomedical indexing and retrieval system based on language modeling approachijseajournal
 
Operations research within UK healthcare: A review
Operations research within UK healthcare: A reviewOperations research within UK healthcare: A review
Operations research within UK healthcare: A reviewHarender Singh
 
A Systematic Literature Review On Supply Chain Risk Management Is Healthcare...
A Systematic Literature Review On Supply Chain Risk Management  Is Healthcare...A Systematic Literature Review On Supply Chain Risk Management  Is Healthcare...
A Systematic Literature Review On Supply Chain Risk Management Is Healthcare...Cynthia King
 
EMR Design as Socio-Technical Mosaic: A Multi-Lens Approach to Emergency Depa...
EMR Design as Socio-Technical Mosaic: A Multi-Lens Approach to Emergency Depa...EMR Design as Socio-Technical Mosaic: A Multi-Lens Approach to Emergency Depa...
EMR Design as Socio-Technical Mosaic: A Multi-Lens Approach to Emergency Depa...juliahaines
 
Berman pcori challenge document
Berman pcori challenge documentBerman pcori challenge document
Berman pcori challenge documentLew Berman
 
An Introspective Component-Based Approach For Meta-Level
An Introspective Component-Based Approach For Meta-LevelAn Introspective Component-Based Approach For Meta-Level
An Introspective Component-Based Approach For Meta-LevelLaurie Smith
 
ATTENTION-BASED DEEP LEARNING SYSTEM FOR NEGATION AND ASSERTION DETECTION IN ...
ATTENTION-BASED DEEP LEARNING SYSTEM FOR NEGATION AND ASSERTION DETECTION IN ...ATTENTION-BASED DEEP LEARNING SYSTEM FOR NEGATION AND ASSERTION DETECTION IN ...
ATTENTION-BASED DEEP LEARNING SYSTEM FOR NEGATION AND ASSERTION DETECTION IN ...ijaia
 
Reply DB5 w9 researchReply discussion boards 1-jauregui.docx
Reply DB5 w9 researchReply discussion boards 1-jauregui.docxReply DB5 w9 researchReply discussion boards 1-jauregui.docx
Reply DB5 w9 researchReply discussion boards 1-jauregui.docxcarlt4
 
The feasibility and need for dimensional psychiatric diagnoses
The feasibility and need for dimensional psychiatric diagnosesThe feasibility and need for dimensional psychiatric diagnoses
The feasibility and need for dimensional psychiatric diagnosesChloe Taracatac
 
Analysis Of Nursing Ideologies.docx
Analysis Of Nursing Ideologies.docxAnalysis Of Nursing Ideologies.docx
Analysis Of Nursing Ideologies.docx4934bk
 
1 Annotated Bibliography Topic (Chosen.docx
1  Annotated Bibliography Topic (Chosen.docx1  Annotated Bibliography Topic (Chosen.docx
1 Annotated Bibliography Topic (Chosen.docxhoney725342
 
An illustrated guide to the methods of meta analysi
An illustrated guide to the methods of meta analysiAn illustrated guide to the methods of meta analysi
An illustrated guide to the methods of meta analysirsd kol abundjani
 

Similar to A Literature Review and a Case Study.pdf (20)

IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Definitions of occupational balance and their coverage by instruments
Definitions of occupational balance and their  coverage by instrumentsDefinitions of occupational balance and their  coverage by instruments
Definitions of occupational balance and their coverage by instruments
 
CHAPTER 10 MIXED METHODS PROCEDURESHow would you write a mixed m
CHAPTER 10 MIXED METHODS PROCEDURESHow would you write a mixed mCHAPTER 10 MIXED METHODS PROCEDURESHow would you write a mixed m
CHAPTER 10 MIXED METHODS PROCEDURESHow would you write a mixed m
 
Systematic Reviews: Context & Methodology for Librarians
Systematic Reviews: Context & Methodology for LibrariansSystematic Reviews: Context & Methodology for Librarians
Systematic Reviews: Context & Methodology for Librarians
 
Thesis writing
Thesis writingThesis writing
Thesis writing
 
Automated Extraction Of Reported Statistical Analyses Towards A Logical Repr...
Automated Extraction Of Reported Statistical Analyses  Towards A Logical Repr...Automated Extraction Of Reported Statistical Analyses  Towards A Logical Repr...
Automated Extraction Of Reported Statistical Analyses Towards A Logical Repr...
 
Paper 2 Summarize, Quote, Critique, 750-1000 words)· Choos.docx
Paper 2  Summarize, Quote, Critique, 750-1000 words)· Choos.docxPaper 2  Summarize, Quote, Critique, 750-1000 words)· Choos.docx
Paper 2 Summarize, Quote, Critique, 750-1000 words)· Choos.docx
 
How to conduct_a_systematic_or_evidence_review
How to conduct_a_systematic_or_evidence_reviewHow to conduct_a_systematic_or_evidence_review
How to conduct_a_systematic_or_evidence_review
 
Biomedical indexing and retrieval system based on language modeling approach
Biomedical indexing and retrieval system based on language modeling approachBiomedical indexing and retrieval system based on language modeling approach
Biomedical indexing and retrieval system based on language modeling approach
 
Operations research within UK healthcare: A review
Operations research within UK healthcare: A reviewOperations research within UK healthcare: A review
Operations research within UK healthcare: A review
 
A Systematic Literature Review On Supply Chain Risk Management Is Healthcare...
A Systematic Literature Review On Supply Chain Risk Management  Is Healthcare...A Systematic Literature Review On Supply Chain Risk Management  Is Healthcare...
A Systematic Literature Review On Supply Chain Risk Management Is Healthcare...
 
EMR Design as Socio-Technical Mosaic: A Multi-Lens Approach to Emergency Depa...
EMR Design as Socio-Technical Mosaic: A Multi-Lens Approach to Emergency Depa...EMR Design as Socio-Technical Mosaic: A Multi-Lens Approach to Emergency Depa...
EMR Design as Socio-Technical Mosaic: A Multi-Lens Approach to Emergency Depa...
 
Berman pcori challenge document
Berman pcori challenge documentBerman pcori challenge document
Berman pcori challenge document
 
An Introspective Component-Based Approach For Meta-Level
An Introspective Component-Based Approach For Meta-LevelAn Introspective Component-Based Approach For Meta-Level
An Introspective Component-Based Approach For Meta-Level
 
ATTENTION-BASED DEEP LEARNING SYSTEM FOR NEGATION AND ASSERTION DETECTION IN ...
ATTENTION-BASED DEEP LEARNING SYSTEM FOR NEGATION AND ASSERTION DETECTION IN ...ATTENTION-BASED DEEP LEARNING SYSTEM FOR NEGATION AND ASSERTION DETECTION IN ...
ATTENTION-BASED DEEP LEARNING SYSTEM FOR NEGATION AND ASSERTION DETECTION IN ...
 
Reply DB5 w9 researchReply discussion boards 1-jauregui.docx
Reply DB5 w9 researchReply discussion boards 1-jauregui.docxReply DB5 w9 researchReply discussion boards 1-jauregui.docx
Reply DB5 w9 researchReply discussion boards 1-jauregui.docx
 
The feasibility and need for dimensional psychiatric diagnoses
The feasibility and need for dimensional psychiatric diagnosesThe feasibility and need for dimensional psychiatric diagnoses
The feasibility and need for dimensional psychiatric diagnoses
 
Analysis Of Nursing Ideologies.docx
Analysis Of Nursing Ideologies.docxAnalysis Of Nursing Ideologies.docx
Analysis Of Nursing Ideologies.docx
 
1 Annotated Bibliography Topic (Chosen.docx
1  Annotated Bibliography Topic (Chosen.docx1  Annotated Bibliography Topic (Chosen.docx
1 Annotated Bibliography Topic (Chosen.docx
 
An illustrated guide to the methods of meta analysi
An illustrated guide to the methods of meta analysiAn illustrated guide to the methods of meta analysi
An illustrated guide to the methods of meta analysi
 

More from Cynthia Velynne

Free Cool Alphabet Letter Designs, Download Free Cool Alphabet Letter
Free Cool Alphabet Letter Designs, Download Free Cool Alphabet LetterFree Cool Alphabet Letter Designs, Download Free Cool Alphabet Letter
Free Cool Alphabet Letter Designs, Download Free Cool Alphabet LetterCynthia Velynne
 
Analytical Writing Sample. GRE GRE Analytical Writing Samp
Analytical Writing Sample. GRE GRE Analytical Writing SampAnalytical Writing Sample. GRE GRE Analytical Writing Samp
Analytical Writing Sample. GRE GRE Analytical Writing SampCynthia Velynne
 
Biography Writing Template Biography Template, Writin
Biography Writing Template Biography Template, WritinBiography Writing Template Biography Template, Writin
Biography Writing Template Biography Template, WritinCynthia Velynne
 
Writing A Good Personal Reflective Essay - How To
Writing A Good Personal Reflective Essay - How ToWriting A Good Personal Reflective Essay - How To
Writing A Good Personal Reflective Essay - How ToCynthia Velynne
 
Learn How To Write A Truly Impressive Sc
Learn How To Write A Truly Impressive ScLearn How To Write A Truly Impressive Sc
Learn How To Write A Truly Impressive ScCynthia Velynne
 
Embossed Letter Sheets Set Of 50 Stationery Shee
Embossed Letter Sheets Set Of 50 Stationery SheeEmbossed Letter Sheets Set Of 50 Stationery Shee
Embossed Letter Sheets Set Of 50 Stationery SheeCynthia Velynne
 
Erianto OngkoS Briefcase Menulis Menjalin Persatua
Erianto OngkoS Briefcase Menulis Menjalin PersatuaErianto OngkoS Briefcase Menulis Menjalin Persatua
Erianto OngkoS Briefcase Menulis Menjalin PersatuaCynthia Velynne
 
Does Money Buy Happiness -
Does Money Buy Happiness -Does Money Buy Happiness -
Does Money Buy Happiness -Cynthia Velynne
 
Examples Of Science Paper Abs
Examples Of Science Paper AbsExamples Of Science Paper Abs
Examples Of Science Paper AbsCynthia Velynne
 
Explanatory Essay Short Write With Notes, Organize
Explanatory Essay Short Write With Notes, OrganizeExplanatory Essay Short Write With Notes, Organize
Explanatory Essay Short Write With Notes, OrganizeCynthia Velynne
 
Sample Scientific Method Paper Write Up
Sample Scientific Method Paper Write UpSample Scientific Method Paper Write Up
Sample Scientific Method Paper Write UpCynthia Velynne
 
How To Write A Good Conclusion To An Academic Essay - In Summary 10
How To Write A Good Conclusion To An Academic Essay - In Summary 10How To Write A Good Conclusion To An Academic Essay - In Summary 10
How To Write A Good Conclusion To An Academic Essay - In Summary 10Cynthia Velynne
 
Application Example College Essay College Essay Hoo
Application Example College Essay College Essay HooApplication Example College Essay College Essay Hoo
Application Example College Essay College Essay HooCynthia Velynne
 
Template For Briefing Paper How To Write A Briefing
Template For Briefing Paper How To Write A BriefingTemplate For Briefing Paper How To Write A Briefing
Template For Briefing Paper How To Write A BriefingCynthia Velynne
 
Research Integrity On Twitter Avoidi
Research Integrity On Twitter AvoidiResearch Integrity On Twitter Avoidi
Research Integrity On Twitter AvoidiCynthia Velynne
 
Home Essay Writer, Essay,
Home Essay Writer, Essay,Home Essay Writer, Essay,
Home Essay Writer, Essay,Cynthia Velynne
 
College Essay Essay On Importance Of Education In
College Essay Essay On Importance Of Education InCollege Essay Essay On Importance Of Education In
College Essay Essay On Importance Of Education InCynthia Velynne
 

More from Cynthia Velynne (20)

Free Cool Alphabet Letter Designs, Download Free Cool Alphabet Letter
Free Cool Alphabet Letter Designs, Download Free Cool Alphabet LetterFree Cool Alphabet Letter Designs, Download Free Cool Alphabet Letter
Free Cool Alphabet Letter Designs, Download Free Cool Alphabet Letter
 
Analytical Writing Sample. GRE GRE Analytical Writing Samp
Analytical Writing Sample. GRE GRE Analytical Writing SampAnalytical Writing Sample. GRE GRE Analytical Writing Samp
Analytical Writing Sample. GRE GRE Analytical Writing Samp
 
Biography Writing Template Biography Template, Writin
Biography Writing Template Biography Template, WritinBiography Writing Template Biography Template, Writin
Biography Writing Template Biography Template, Writin
 
Writing A Good Personal Reflective Essay - How To
Writing A Good Personal Reflective Essay - How ToWriting A Good Personal Reflective Essay - How To
Writing A Good Personal Reflective Essay - How To
 
Learn How To Write A Truly Impressive Sc
Learn How To Write A Truly Impressive ScLearn How To Write A Truly Impressive Sc
Learn How To Write A Truly Impressive Sc
 
Embossed Letter Sheets Set Of 50 Stationery Shee
Embossed Letter Sheets Set Of 50 Stationery SheeEmbossed Letter Sheets Set Of 50 Stationery Shee
Embossed Letter Sheets Set Of 50 Stationery Shee
 
Erianto OngkoS Briefcase Menulis Menjalin Persatua
Erianto OngkoS Briefcase Menulis Menjalin PersatuaErianto OngkoS Briefcase Menulis Menjalin Persatua
Erianto OngkoS Briefcase Menulis Menjalin Persatua
 
Does Money Buy Happiness -
Does Money Buy Happiness -Does Money Buy Happiness -
Does Money Buy Happiness -
 
Examples Of Science Paper Abs
Examples Of Science Paper AbsExamples Of Science Paper Abs
Examples Of Science Paper Abs
 
Pin On English
Pin On EnglishPin On English
Pin On English
 
Explanatory Essay Short Write With Notes, Organize
Explanatory Essay Short Write With Notes, OrganizeExplanatory Essay Short Write With Notes, Organize
Explanatory Essay Short Write With Notes, Organize
 
Sample Scientific Method Paper Write Up
Sample Scientific Method Paper Write UpSample Scientific Method Paper Write Up
Sample Scientific Method Paper Write Up
 
How To Write A Good Conclusion To An Academic Essay - In Summary 10
How To Write A Good Conclusion To An Academic Essay - In Summary 10How To Write A Good Conclusion To An Academic Essay - In Summary 10
How To Write A Good Conclusion To An Academic Essay - In Summary 10
 
Pin On AVID
Pin On AVIDPin On AVID
Pin On AVID
 
Application Example College Essay College Essay Hoo
Application Example College Essay College Essay HooApplication Example College Essay College Essay Hoo
Application Example College Essay College Essay Hoo
 
Template For Briefing Paper How To Write A Briefing
Template For Briefing Paper How To Write A BriefingTemplate For Briefing Paper How To Write A Briefing
Template For Briefing Paper How To Write A Briefing
 
Case Study Template
Case Study TemplateCase Study Template
Case Study Template
 
Research Integrity On Twitter Avoidi
Research Integrity On Twitter AvoidiResearch Integrity On Twitter Avoidi
Research Integrity On Twitter Avoidi
 
Home Essay Writer, Essay,
Home Essay Writer, Essay,Home Essay Writer, Essay,
Home Essay Writer, Essay,
 
College Essay Essay On Importance Of Education In
College Essay Essay On Importance Of Education InCollege Essay Essay On Importance Of Education In
College Essay Essay On Importance Of Education In
 

Recently uploaded

Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxChelloAnnAsuncion2
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Quarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayQuarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayMakMakNepo
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 

Recently uploaded (20)

Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Quarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayQuarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up Friday
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 

A Literature Review and a Case Study.pdf

  • 1. Methods for Evaluation of Medical Terminological Systems A Literature Review and a Case Study D. G. T. Arts1 , R. Cornet1 , E. de Jonge2 , N. F. de Keizer1 1 Academic Medical Center, Department of Medical Informatics, Amsterdam, The Netherlands 2 Academic Medical Center, Department of Intensive Care, Amsterdam, The Netherlands Summary Objectives: The usability of terminological systems (TSs) strongly depends on the coverage and correctness of their content. The objective of this study was to cre- ate a literature overview of aspects related to the con- tent of TSs and of methods for the evaluation of the content of TSs. The extent to which these methods overlap or complement each other is investigated. Methods: We reviewed literature and composed defini- tions for aspects of the evaluation of the content of TSs. Of the methods described in literature three were se- lected: 1) Concept matching in which two samples of concepts representing a) documentation of reasons for admission in daily care practice and b) aggregation of patient groups for research, are looked up in the TS in order to assess its coverage; 2) Formal algorithmic evaluation in which reasoning on the formally repre- sented content is used to detect inconsistencies; and 3) Expert review in which a random sample of concepts are checked for incorrect and incomplete terms and re- lations. These evaluation methods were applied in a case study on the locally developed TS DICE (Diagnoses for Intensive Care Evaluation). Results: None of the applied methods covered all the aspects of the content of a TS. The results of concept matching differed for the two use cases (63% vs. 52% perfect matches). Expert review revealed many more errors and incompleteness than formal algorithmic evaluation. Conclusions: To evaluate the content of a TS, using a combination of evaluation methods is preferable. Dif- ferent representative samples, reflecting the uses of TSs, lead to different results for concept matching. Ex- pert review appears to be very valuable, but time con- suming. Formal algorithmic evaluation has the poten- tial to decrease the workload of human reviewers but detects only logical inconsistencies. Further research is required to exploit the potentials of formal algorithmic evaluation. Keywords Terminological systems, evaluation, methods, definitions Methods Inf Med 2005; 44: 616–25 1. Introduction Severaldevelopmentsinhealthcare,suchas accountability of care and increased use of electronic patient records, have led to an in- creased need for accurate, detailed and structured registration of medical data. Many terminological systems (TSs) have beenandarestillbeingdevelopedtosupport this.ATS interrelates concepts of a particu- lar domain and provides their terms and codes [1]. The relations between the con- cepts within a TS can be hierarchical (e.g. Is-A) or non-hierarchical (e.g. has-loca- tion). In addition some TSs hold (formal) rules for the composition of new concepts by combining existing concepts. Examples of medical TSs are the International Clas- sification of Diseases (ICD) [2], the Sys- temized Nomenclature of Medicine (SNOMED) [3, 4], and the NorthAmerican Nursing Diagnosis Association (NANDA) terminology [5]. By the direction of the Dutch National Intensive Care Evaluation foundation (NICE)a our department is en- gaged in a continuous effort to develop aTS and corresponding software for the domain of intensive care (IC). This system is called Diagnoses for Intensive Care Evaluation, DICE [6]. For the study described here we distin- guish two types of use cases for termino- logical systems. On the one hand TSs are used by medical staff to document medical data, e.g. patient characteristics or treat- ment, in the medical record. On the other hand TSs are used to select homogeneous patient groups for research or management purposes. A medical researcher or manager selects from the TS those concepts that de- fine a homogeneous patient group.After se- lecting the appropriate concepts the re- searcher/manager can identify patients that fulfill the criteria, by searching their elec- tronic records. Several authors have specified required characteristicsofaTS[7-9].In2000alist of standard requirements for TS was devel- oped and approved by the International Or- ganization for Standardization (ISO) [10]. In this study we will focus on requirements related to the content of a TS, i.e. the con- cepts, their terms, and the relations between the concepts. The content of a TS is of ut- most importance for its acceptance. A phy- sician needs to be able to be complete and sufficiently accurate in depicting the care process, and clinical researchers need to be able to be complete in selecting specific pa- tient groups at any desired level of aggre- gation. To realize this all concepts, terms and relations belonging to the domain of the TS should be represented and should be cor- rect. For example, we want sufficient terms attachedtoaconcept,andwewant theterms to be only the correct ones. A number of methods to evaluate the content of a TS have been described in lit- erature.A literature study was performed to gain insight into the several types of evalu- ation methods. The diversity of the ter- minology used in this context has incited us to compose definitions for the most promi- nent expressions that are used in this article. In addition, we present three common evaluation methods that focus on (but not restrict to) the coverage and the correctness of a TS’ content. These three methods have been applied in a case study on theTS DICE Methods Inf Med 5/2005 Received: April 15, 2004; accepted: March 15, 2005 a The Dutch National Intensive Care Evaluation (NICE) foundation: www.stichting-nice.nl 616 © 2005 Schattauer GmbH
  • 2. [6].Theaim ofthecasestudywastoanalyze the extent to which the results of the three methods overlap or complement each other. For this we compared the results produced by each method. 2. Literature Study As mentioned in the introduction, in this study we focus on the evaluation of the con- tent of aTS.To gain insight into methods for evaluation of the content of TS that have been applied by others we performed a re- view of relevant Medline-indexed journal articles by using (combinations of) the fol- lowingkeywords:evaluation,validation,as- sessment, audit, terminological system, ter- minology, ontology, classification, thesau- rus, nomenclature. In addition, articles were retrieved from reference lists and personal databases. An article was considered rel- evant if it described the evaluation of the quality of the content of aTS which was de- veloped for a medical domain.Articles were selected from the past ten years. 2.1 Definitions Our literature study uncovered some incon- sistencies in the terminology that is used in this field. We therefore provide this article with some definitions that have been ap- plied in this study. Firstofallwehaverestrictedourselvesto the evaluation of TSs’content. By our defi- nition, the content of a TS includes con- cepts, the terms attached to these concepts and the relations between these concepts. ‘Concepts’ can be defined as ‘units of thought formed by the characteristics of ob- jects’. Objects might be concrete things such as the heart valve of patient X or ab- stract things such as the pain of patient Y. ‘Terms’are used to designate a concept.The ‘relations’ between concepts can be hier- archical (e.g. Is-A) or non-hierarchical [1]. A term that frequently appears in litera- ture considering the evaluation ofTS is ‘do- main completeness’. ‘Domain complete- ness’ can be defined as the extent to which the content of a TS covers the intended do- main. Domain completeness according to this definition would be hard to measure or to quantify, because the continuous changes in medical knowledge make it impossible to define exactly what comprises a particular medical domain. By lacking of this gold standard, it is impossible to determine to what extent aTS is complete in covering the intended medical domain. Instead, we could take a subset of concepts or terms represen- tative for the intended domain and see to what extent this subset is incorporated in a particularTS.This way we will measure the ‘content coverage’. Table 1 gives the definitions of ‘content coverage’and of the ‘coverage’and the ‘cor- rectness’ of the separate elements (i.e. con- cepts, terms and relations) that comprise the content of TS. The definitions of ‘concept coverage’ and ‘term coverage’might look straightfor- ward; however the measured coverage can be highly influenced by choices made in the evaluation process. For example one can choose whether or not to consider the occur- rence of specific concepts or terms in real practice. Missing a frequently occurring concept or term might be more severe than missing those which are hardly ever used. We therefore define ‘occurring coverage’ and ‘unique coverage’. For example in Fig- ure 1 we see that the ‘occurring coverage’ is 80% (4/5), whereas the ‘unique coverage’ is 75% (3/4). SomeTSsenablethecompositionofnew conceptsbycombiningtwoormoreexisting concepts. We call this feature “post-coor- dination”. In case aTS enables post-coordi- nation of concepts it is also important to consider whether or not post-coordinated concepts are taken into account when deter- mining the coverage of the concepts. Many concepts might not be present in a TS as a pre-coordinated concept, but can be com- posed by combining two or more concepts. If only pre-coordinated concepts are con- sidered in the evaluation, then the measured concept coverage will be lower than when post-coordinated concepts are also taken into account. Inthisstudywealsofocusonthecorrect- ness of the content of TSs. Correctness can only be measured for concepts, terms and relations that are covered by theTS. Defini- tionsofcorrectnessareprovidedinTable1. Content coverage: The extent to which the content (e.g. concepts or terms) within a subset, representative for the domain of in- terest, can be represented by the content of the terminological system. Concept coverage: the extent to which the concepts within a subset, representative for the domain of interest, can be represented by the concepts within the terminological system. Occurring concept coverage: concept coverage using a subset in which each concept may occur more than once, indicating the occurrence of that concept in practice. Unique concept coverage: concept coverage using a subset in which each concept occurs at most once. Post-coordinated concept coverage: the extent to which the concepts within a representative subset can be repre- sented by the concepts (either pre-existing or created with use of composition rules) within the terminological system. Term coverage: the extent to which the terms within a representative subset exist in the terminological systems' content, provided that the terms relate to concepts that are present in the terminological system. Occurring term coverage: term coverage using a subset in which each term may occur more than once, indicating the occurrence of that term in practice. Unique term coverage: concept coverage using a subset in which each term occurs at most once. Relation coverage: the extent to which actual relations between concepts are represented in the TS’ content, provided that they can be represented considering the semantic model of the TS. Concept correctness: The extent to which the concepts that exist in the TS are non-redundant, non-vague and non-ambiguous. Term correctness: The extent to which the terms that are attached to concepts in the TS are free of textual errors and attached to the right concepts. Relation correctness: The extent to which the relations that exist in the TS are consistent and in accordance with the factual re- lations between concepts. Table 1 Definitions for coverage and correctness of (the elements of) the content of a TS Methods Inf Med 5/2005 617 Methods for Evaluation of Medical Terminological Systems
  • 3. 2.2 Evaluation Methods Table 2 contains a list of aspects ofTSs that have been evaluated by others. These as- pectshavebeencategorizedaccordingtothe aspects of the coverage and correctness of TSs’ content, that were defined in Table 1. Most of the evaluation studies described in literature focus on the coverage of either the concepts or the terms in a TS. The coverage of concepts or terms is often evaluated through ‘concept matching’ or ‘term matching’ [7, 11-13, 17-19, 22]. Matching implies that a representative sample of concepts or terms is extracted from the domain in which the TS is being used, e.g. the diagnoses of patients at the oncology department. The representative sample can be randomly or non-randomly chosen. The concepts from the sample are then matched with those of aTS.The extent towhichconceptsortermsinthesamplecan be matched with concepts in the TS is mostly presented by means of matching cat- egories. For example Chute et al. [12] have applied a scoring scale for the matching of concepts from 0 to 2, where 0 = no match, 1 = fair match, 2 = complete match. A dif- ferent categorization of matches was used by Warnekar et al. [29]. According to this Fig. 1 Example of a terminological system and a representative sample containing concepts that are being matched to the terminological system to evaluate the concept coverage Coverage Concepts Terms Relations Correctness Concepts Terms Concept coverage Consistent level of abstraction Term coverage Coverage of relationships Availability of definitions Completeness of definition Non-redundancy Lexical consistency N 14 1 10 7 1 1 3 1 Relations Appropriate terms Accurate hierarchical relations Consistent hier. relations Sensible and appropriate hier. pairs Non-redundant hier. relations Non-circular hier. relations Consistent categorization Relevant categorization Unambiguous categorization Accurate non-hierarchical re- lations Accurate definitions 3 1 6 2 1 3 2 1 1 1 1 Bakken 1994 [11] X X Chute 1996 [12] X Campbell 1997 [7] X X X X Humphreys 1997 [13] X X Bodenreider 1998 [14] X X X X Cimino 1998 [15] X X X Cimino 1999 [16] X X X Bowles 2000 [17] X X X X X Cimino 2001 [18] X X X Hardiker 2001 [19] X X X Schulz 2001 [20] X X Bodenreider 2001 [21] X Bodenreider 2002 [22] X X Bodenreider 2002 [23] X X Cornet 2002 [24] X X Moss 2003 [25] X Hardiker 2003 [26] X X X X X Haridiker 2003 [27] X Bodenreider 2003 [28] X X Warnekar 2003 [29] X X Wasserman 2003 [30] X X X Brown 2004 [31] X Penz 2004 [32] X Ceusters 2004 [33] X X X X X Table 2 Categorization of quality aspects of the content of TS, as evaluated by others 618 Arts et al. Methods Inf Med 5/2005
  • 4. categorization matches could be exact matches, lexical or semantical matches, or no-matches. Wasserman et al. [30] distin- guished, apart from the exact matches, syn- onyms and no-matches that required the ad- dition to the hierarchy of a new ‘leaf’, a new ‘leaf with multiple stems’ or a ‘graft to an existing branch’. The content coverage can be represented, for example, by calculating the percentage of perfect matches. To be representative, the source of the sample of concepts or terms should reflect the in- tended use of the TS. For example if a TS will be used by nurses for documentation of nursing information, then the sample of concepts could be well extracted from existing nursing documentation in medical records [11, 17]. While in most cases the matching process is performed by humans, Penz et al. [32] applied two automated map- ping tools. Penz et al. were not able to assess their overall value, due to high frequencies of spelling errors and jargon in their sample. In a study of Brown et al. [31] one of the automated mapping tools, the SmartAccess Vocabulary Server (SAVS), has proven to be reliable. Besides the ‘matching method’, other methods have been applied to evaluate the concept coverage or term coverage. In a study by Bodenreider et al. [14] the system being evaluated had already been in use for some time.The measure of coverage of con- ceptsinthesystemwasbasedonthenumber of concepts that had to be added to the sys- tem by theusersdueto under-representation in the TS. In addition, they evaluated the coverage of hierarchical and other relations within the UMLS. They designed an algo- rithm to automatically extract the UMLS concepts that are related to procedures in a particular domain. Starting with a few con- cepts related to the domain, the algorithm recursively selected all their subordinate concepts. This navigation was based on the relations between concepts. Lacking of re- lations resulted in silence: A concept might seem not to exist in the UMLS only because it is not related to another concept. The amount of relations missing was estimated by comparing the concepts in the subset re- trieved by the algorithm to the concepts that are needed for the representation of the pro- cedures within the domain of interest. Whereas Bodenreider used the relations between concepts to evaluate the content coverage, these relations are more often used for the evaluation of the correctness of the content. Many TSs nowadays consist of more than just a simple list of terms; hier- archical and non-hierarchical relations exist between the concepts. By looking at the re- lations between concepts, inconsistent, am- biguous or redundant concepts may be re- vealed. For example, if two individual con- cepts share the same meaning, they are ac- tually redundant concepts. If ‘polyneu- ropathy’ and ‘polyneuritis’ were each de- fined as separate concepts one could find that they share the same relations to other concepts, and that they actually refer to the same disease. In this case one of the con- cepts ‘polyneuropathy’and ‘polyneuritis’is redundant. In case of a hierarchical structur- ing of the concepts, a concept might be in- consistently classified if it has relations which are in conflict with the relations of its superordinate concept. For example, incon- sistency might occur if a superordinate dis- ease is defined to be caused by a bacterium, whereas the subordinate disease is defined to be caused by a virus. The inconsistency here becomes apparent if ‘virus’ and ‘bac- terium’ were explicitly made mutually ex- clusive. Cimino used this kind of method to detect ambiguities, redundancy and incon- sistent hierarchical relations within the UMLS [15]. In addition, Bodenreider et al. [14] stated that hierarchical relations be- tween concepts can be used for the evalu- ation of the categorization of concepts. Their evaluation was based on the idea that concepts inherit properties from their super- ordinate concept and thus a concept is sup- posedtobelongatleasttothesamecategory or categories as its superordinate concept. Evaluation based on relations between con- cepts has the potential to be automated or semi-automated. For example a computer algorithm could detect concepts that share the exact same definitions or concepts that were assigned to a number of semantic types,ofwhichtwoaremutuallyexclusive. To enable automated evaluation, the TS content (especially the relations between concepts) should be represented in a formal way. Examples of formal representations can be found in the SNOMED-CT [34] and theGALENterminologies[35].SNOMED- CT and GALEN both use a Description Logic to represent their knowledge. In a study of Cornet et al. [24] migration of con- tent representation from frame-based to Description-Logic-based has proven to be valuable in determining redundancies in concept definitions and in forcing the knowledge modeler to be aware of am- biguities. Schulz and Hahn [20] have ex- pressed UMLS knowledge in Description Logic. They provide evidence that embed- ding the knowledge into a formal reasoning framework is effective to identify inconsist- encies. Bodenreider et al. [23] applied an- other approach to the automated detection of inconsistencies, by using the lexical knowledge contained in a terminological system. They assume that all terms are composed of a modifier, such as ‘primary’ or ‘secondary’, and a context, a noun phrase such as ‘adrenocortical insufficiency’. This would result in the terms ‘primary adreno- cortical insufficiency’and ‘secondary adre- nocortical insufficiency’. They hypothesize that terms of the form modifier1-context and modifier2-context are co-hyponyms of the term ‘context’. E.g. ‘primary adreno- cortical insufficiency’and ‘secondary adre- nocortical insufficiency’ are hyponyms of ‘adrenocortical insufficiency’. They base their evaluation on the fact that in a consist- ent terminology the terms modifier1-con- textandmodifier2-contextshouldbe1)both present and 2) in hierarchical relation with the term ‘context’. The conclusion of this study was that this method alone is not suf- ficientforensuringtheconsistencyofaTS. A completely different method to eval- uate the correctness of relations between concepts was applied by Campbell et al. who evaluated the clinical utility of pairs of hierarchically related concepts within three medical terminological systems (SNOMED, READ and UMLS) [7]. Six clinician-informatics specialists manually reviewed random samples of hierarchical pairs. They used a five-point Likert scale (1 = extremely dissatisfied with pairing, 3 = neutral, 5 = extremely satisfied with pairing) to rate the clinical utility of each pair. Review by domain experts was also applied in a study by Hardiker [26]. Methods Inf Med 5/2005 619 Methods for Evaluation of Medical Terminological Systems
  • 5. In summary, we distinguish four eval- uation strategies for the content of TSs; 1) concepts matching, 2) evaluation based onrelationsbetweenconcepts,3)evaluation by domain experts, and 4) evaluation based on lexical knowledge. Based on this literature review on methods for evaluation of TS we selected three methods for the case study, which will be described below. The three methods re- flected the first three of the above men- tioned evaluation strategies. The methods were chosen because previous studies deem them promising and because they were ap- plicabletoourcasestudywiththeDICETS. 3. Case Study 3.1 Background A study by de Keizer et al. in 1998 has shown that none of the contemporary TSs met the criteria of a TS for Dutch intensive care [6].This has been the motivation to de- velop a new TS, Diagnoses for Intensive Care Evaluation (DICE). The TS DICE comprises reasons for admission to the In- tensive Care Unit (ICU), and some of their characteristics, such as the anatomical lo- calization, the dysfunction and the etiology. The DICE TS contains 2373 concepts, of which 1456 are diagnoses that form reasons for admission to the ICU. Other concepts in- clude for example anatomical locations and causes of disease. There are 50 relation types. Thirteen of these, for example “has_anatomical_localization”, may be used for any of the diagnoses. The other 37 relation types are attributes which are spe- cific for certain diagnoses, for example the chronicity (e.g. acute, chronic) of organ dysfunction. Currently a total of 10,425 re- lations between concepts have been de- fined. DICE can be incorporated into Pa- tient Data Management Systems to facili- tate documentation by physicians, and it is intended to be used for patient selection and aggregation for medical research and man- agement overviews. Two domain experts, i.e. intensive care physicians, and two medi- cal informaticians started seven years ago with a rather simple hierarchy of reasons for ICU admission achieved from the ICNARC Coding Method [36]. Due to the complexity of concepts in the domain, the need for a separation of concepts and terms, and the need for a structure to enable aggregation of homogeneous patient groups we chose to specify the concepts and their character- istics more formally.The DICE content was therefore converged to a frame-based struc- ture. In the development process of DICE we are currently at the stage where we need to evaluate to what extent the current con- tent of DICE meets the requirements, in terms of coverage and correctness, for the intended use of the system. 3.2 Methods In the case study we will apply three methods: concept matching, formal algo- rithmic evaluation and expert review, to evaluate the coverage and the correctness of the DICE content and to analyze the extent to which these methods overlap or comple- ment each other. We will compare the over- all coverage and correctness measured by the three methods. For methods 2 and 3 the individual missing or incorrect concepts, termsandrelationsrevealedbyeachmethod are compared. The methods used are de- scribed below. 3.2.1 Method 1: Concept Matching In this study concept matching was carried out twice with different representative samples of concepts that were matched to the TS. The samples differed in the sources from which they were retrieved. The two sources reflected the two distinct purposes of the system, i.e. 1A) the documentation of and communication about patients’ rea- son(s) for admission and 1B) the selection of homogeneous (with respect to diagnosis) patient groups for clinical research and management. Concepts from these two rep- resentative samples were matched to the content of DICE. Concepts found in DICE could be 1) a perfect match, 2) related (e.g. mitral valve instead of tricuspid valve), 3) too narrow in meaning (e.g. subarach- noidal heamorrhage instead of intracranial heamorrhage), 4) too general in meaning (e.g. polyneuropathy instead of infectious polyneuropathy), or 5) a concept could not be coded at all. We applied this categori- zation because in case of a suboptimal match it enabled us to be specific about why a match was not a ‘perfect match’. The dis- tribution of the concepts among the match- ing categories was calculated when using all diagnoses occurring in the sets and when using only the unique diagnoses. DICE offers the users the opportunity to compose new (post-coordinated) concepts out of two or more consisting (pre-coordinated) con- cepts. In this study the diagnoses within DICE that were matched to the diagnoses within the samples could also be post-coor- dinated diagnoses. Method 1A: Evaluation for Documentation of Reasons for Admission DICE was used at the intensive care depart- ment of the Academic Medical Center in Amsterdam during March 2001. Attending intensive care physicians used the system in real practice to code actual patients’reasons for admission. The reasons for admission that the physicians wanted to record in a pa- tient’s medical record comprised the repre- sentative sample of concepts. A physician assigned each concept within the sample to a matching category to express the extent to whichitwasrepresentedinDICE.Incaseof a non-perfect match the physician entered the actual diagnosis in free text. This en- abled checking the correct assignment of concepts to the matching categories. Method 1B: Evaluation for Aggregation of Patient Groups We collected all diagnoses that formed (a part of) the in- and exclusion criteria of clinical studies that appeared in two impor- tant intensive care journals (Intensive Care Medicine and Critical Care Medicine) be- tween January 1, 2001 and July 1, 2001. These diagnoses comprised the represen- tative sample of concepts that was matched to the TS DICE. The concepts within the sample were assigned to one of the match- ingcategoriesbytwooftheauthors(DAand EJ), by means of consensus. 620 Arts et al. Methods Inf Med 5/2005
  • 6. 3.2.2 Method 2: Formal Algorithmic Evaluation As an increasing number of medical ter- minological systems is based on formal rep- resentation, it is important to understand the potential of readily available reasoning al- gorithms exploiting the formal represen- tation. Such algorithms can be instrumental for evaluation of the content of terminologi- calsystems.Oneexampleofdeployingsuch algorithms is the constraint checking engine based on Protégé Axiom Language (PAL), which has been applied to the Gene Ontol- ogy [37]. Whereas in this example a frame- based representation is used, we have used the Description Logic (DL) formalism [38]. The publicly available reasoner RACER [39] was used to perform satisfiability test- ing, which provides a means for detecting mutuallyconflictingconceptdefinitions.As the content of DICE has a frame-based rep- resentation, it was migrated to a DL-based representation, according to the process de- scribed in [40]. In this migration process, assumptions are made on the semantics of definitions, such as for example disjoint- ness of sibling concepts. For this evaluation we randomly extracted a sample of 80 pre-coordinated diagnoses in DICE. The reasoning process revealed concepts that had inconsistent definitions, which indicated the presence of incorrect (hierarchical or non-hierarchical) relations. For example, if the superordinate concept infectious polyneuropathy (see Table 3) was defined to be caused by a virus and the subordinate leprosy polyneuropathy was defined to be caused by the Myco- bacterium leprae, while it was known that Mycobacterium leprae is not a virus, then the subordinate concept would be identified as inconsistent. The inconsistency here could have been caused by the fact that the etiology of the superordinate concept should also include bacterium instead of virus alone, or by the fact that leprosy poly- neuropathy should not have been classified as a subordinate concept of infectious poly- neuropathy. 3.2.3 Method 3: Expert Review We printed on paper forms the terms and (hierarchical and non-hierarchical) rela- tions belonging to each of the 80 randomly selected concepts that were also used in the formal algorithmic evaluation (Fig. 2). Six domain experts, all experienced intensive care physicians, manually reviewed the terms and relations belonging to the concepts. If they found a missing or incorrect term or re- lation they wrote this on the paper forms. One of the authors (DA) collected and ana- lyzedthecommentsofthedomainexperts. 3.3 Results 3.3.1 Concept Matching During the study to evaluate the coverage of DICE for documentation of reasons for ad- mission(1A)10ICUphysiciansregistereda total of 164 diagnoses, of which 107 were unique. For the concept matching to evalu- ate the coverage of DICE for aggregation of patient groups (1B) we retrieved 218 diag- noses, of which 187 were unique. The over- lap of the two samples consisted of eight unique diagnoses. The distribution of con- cepts among the matching categories is dis- played in Figure 3. Concept matching for documentation of diagnoses in daily care practice (1A) re- sultedin63%(n=67)perfectmatcheswhen Infectious_polyneuropathy ⊆ Polyneuropathy ∩ ∃ hascause Virus ∩ ∀ hascause Virus Leprosy_polyneuropathy ⊆ Infectious_polyneuropathy ∩ ∃ hascause Mycobacterium_Leprae ∩ ∀ hascause Mycobacterium_Leprae Mycobacterium_Leprae ⊆ Bacterium Disjoint (Virus, Bacterium) Table 3 A fictitious example of (inconsistent) concept definitions in a DL-based terminological system Fig. 2 Example of a paper form for expert review Methods Inf Med 5/2005 621 Methods for Evaluation of Medical Terminological Systems
  • 7. only uniquely occurring diagnoses were considered (unique concept coverage) and 74% (n = 121) perfect matches when each single occurrence of a diagnosis in the sample was considered (occurring concept coverage). Concept matching for aggre- gation of patient groups (1B) resulted in lower frequencies of perfect matches (52% (n = 97) unique, 51% (n = 111) occurring). The frequency of ‘too general’ matches was higher for the concept matching for ag- gregation of patient groups for clinical re- search and management (1B) (25% unique, 24% occurring) than for documentation of diagnoses (1A) (14% unique, 10% occur- ring). The structure of DICE enables the composition of new diagnoses by specify- ing their non-hierarchical relations (post- coordination). ‘Too general’ matches indi- cated that one or more non-hierarchical re- lations,thatarenecessarytoenablethepost- coordination of that specific diagnosis, were missing. DICE enables users to search for a diagnosis based on its characteristics (non-hierarchical relations). For example a user might search DICE for a diagnosis that isaninfectionthatislocatedinthelungsand retrieve the diagnosis Pneumonia. In one case the physicians appeared to be unable to find a diagnosis based on its characteristics. This indicated that the characteristics that thephysicianattachedtothisdiagnosiswere not consistent with those in DICE.The con- cept was available in DICE, but some char- acteristics were missing. The matching cat- egory assigned by the physician to this diagnosis was incorrect, i.e. ‘no match’ whereas it actually had to be ‘too general’. The correct category, ‘too general’, was used in the analysis of the results. 3.3.2 Formal Algorithmic Evaluation and Expert Review The formal algorithmic evaluation of the 80 concepts randomly selected from DICE re- vealed 28 concepts with inconsistent defini- tions.Teninconsistencieswereduetoerron- eous assumptions made during the mi- grationofDICEfromtheframe-basedtothe DL-based representation.The remaining 18 inconsistent concepts were caused by 32 er- rors consisting of ten missing relations, ten incorrect relations and twelve relations that were specified as ‘defining’ instead of ‘qualifying’ relations. An example of the latter is ‘pericarditis’, which was defined as always being an infection (defining re- lation), whereas in fact this is not always the case, i.e. ‘pericarditis’ can be an infection (qualifying relation). Pericarditis is only an infection in case a bacterium or a virus caused the inflammation. For the 80 selected concepts the six do- main experts found 397 missing relations, 123 incorrect relations, 70 relations that were specified as ‘defining’ in stead of ‘qualifying’ relations, five missing terms, 15 incorrect terms and two diagnoses that should be deleted from the TS content. Of the total of 612 unique discrepancies 173 (28%) were found by two or more domain expertsand83(14%) werefoundbythreeor more domain experts. Twenty-four errors werefoundbothwiththeformalalgorithmic evaluation and by the expert review.Table 4 displays the numbers of the different types of errors and omissions were discovered by each of the three methods applied in this study. 4. Discussion Over the past 15 years much has been written about the content of TSs. Require- ments for the content ofTSs and evaluations of the quality of the content of TSs have beendescribed.Thispaperprovidesanover- view of what aspects determine the quality of a TS’ content and how these aspects can Fig.3 Distributionofconceptsamongthematchingcategories.ThematchingcategoriesrepresenttheextenttowhichconceptsintheTSDICEmatchedconceptswithinrepresentativesamples for (1A) documentation of diagnoses in daily carepractice and(1B) aggregation ofpatient groups for clinical research andmanagement,includingpost-coordination, split-up into unique and occurring concept matching. 622 Arts et al. Methods Inf Med 5/2005
  • 8. Quality aspects of TS’ content Coverage Concepts Terms Relations Correctness Concepts Terms Relations Total number of errors found Concept matching 1A (164 concepts) 24 9 19 - - - 52 1B (218 concepts) 49 4 58 - - - 111 Formal algorithmic evaluation (80 concepts) - - 10 - - 22 32 Expert review (80 concepts) 5 5 392 2 15 193 612 Table 4 Numbers of missing or incorrect concepts, terms or relations discovered by the three methods be evaluated. The aspects most often evalu- ated were the coverage of terms and of con- cepts [7, 11-14, 16, 17, 22, 25, 26, 29-32]. Lately the coverage and the correctness of relations between concepts have received increasing attention, especially by Boden- reider et al. [14, 21-23, 28] and Hardiker [19, 26, 27]. The case study described in this paper providesacomparisonofthreemethodsthat can be applied to evaluate the coverage and/ or the correctness of the content ofTSs.The first method consisted of two ‘concept matching’studies which only differed in the origin of the representative samples of con- cepts to be matched. The overlap of the two samples was relatively small. The matching categories assigned by the physicians for evaluation of DICE for documentation of diagnoses (1A) were checked by the same two people that assigned the categories for the evaluation of DICE for aggregation of patient groups (1B). This makes the as- signed matching categories comparable be- tween the two studies. Only once the as- signed matching category was found to be incorrect. This indicates that the method, ‘concept matching’by physicians, produces reliable results. Differencesintheresultsofthetwo‘con- cept matching’studies especially concerned the percentage of perfect matches and the percentageofconceptsthatwerefoundtobe too general in meaning. Hales et al. [41] have asserted that the quality of a TS is de- fined relative to its intended use, which is a major barrier to the evaluation of TSs. The intended use of a TS specifically plays an important role in the evaluation of its con- tent.The results of this study endorse the as- sertions of Hales et al. The quality of the content of DICE did appear to be relative to the purpose of the system and it appeared that in the case of DICE we couldn’t rely on a single measure (1A or 1B) to get a com- plete overview on the coverage of the DICE content. Wheninterpretingtheresultsofthe‘con- cept matching’ evaluation as applied here one needs to keep in mind that it concerns only a sample of concepts. What we measure by concept matching is the ‘con- cept coverage’ which is merely an approxi- mation of the completeness of all con- cepts that belong to the domain of interest (‘domain completeness’). In view of the methods for retrieval of the representative samples of concepts (or terms) there is a chance that the sample does not contain concepts that only rarely occur in practice. However, the question of whether rarely oc- curring concepts are represented in a TS might not be as important as whether con- cepts that frequently occur are represented. The frequencies of occurrence of concepts in practice can be taken into account by using each single occurrence of a concept within the representative sample for (occur- ring) concept matching. The increase in concept coverage when measuring the oc- curring instead of the unique concept cover- age (Fig. 4) indicates that, in the case of DICE, the concepts that were not repre- sentedweremostlythenotfrequentlyoccur- ring concepts. There appeared to be large differences between the numbers of errors or omissions found by the formal algorithmic evaluation and those found by the domain experts.The physicians identified many more errors and omissions than the formal algorithmic evaluation. The difference stems from the fact that the formal algorithmic evaluation only revealed logically incorrect defini- tions, whereas the physicians also identified wrong terms and the logically correct, but clinically incorrect definitions. For exam- ple, if the definition of encephalopathy statedthatitalwaysinvolvesastateofcoma, then the formal algorithmic evaluation would render this correct. The physician however would not agree with this defi- nition. Instead (s)he would rather say that a patient suffering from encephalopathy could be in a state of coma. The formal algorithmic evaluation, as it was performed here, has some other draw- backs.ThemigrationofDICEfromaframe- basedtoaDL-basedrepresentationrequired a number of assumptions that had to be made. In our case these assumptions made some concepts appear inconsistent, whereas they actually were not. Another shortcom- ing was that by using the formal algorithmic evaluation as presented here the inconsist- ent definitions could only be identified.The pinpointing of the actual causes of the in- consistencies had to be done manually. We are currently working on ways to automate this identification process [42, 43]. The major drawback of the expert review was that it took the physicians much time to look carefully at all the terms and relations belonging to a concept. Similarly, the analy- sis of the comments generated by the phy- sicians was a very time-consuming effort.A large number of the comments were given byonlyonereviewer.Whereasinmostcases only comments on which reviewers have reachedconsensuswill beprocessed,alarge part of the comments will not be used for updating theTS. It is arguable how many re- viewershavetobeinvolvedintheconsensus process and how many reviewers have to agree on a comment before it is processed. Automated detection of inconsistent con- Methods Inf Med 5/2005 623 Methods for Evaluation of Medical Terminological Systems
  • 9. cepts, such as the formal algorithmic evalu- ation as it was applied in this study, does have the potential to limit and focus the ef- fort of domain experts in the reviewing process. However, this requires that these methods are further explored. We will con- sider this in our future research.To decrease the efforts for physicians to review the con- tent of a TS we have started the implemen- tation of a system for ‘internet-based ter- minological knowledge reviewing’, called KEBoRT (Knowledge Editorial Board on- lineReviewingTool),whichwill beusedfor reviewing the entire DICE content [44]. A shortcoming of the case study is that the evaluation methods were only applied on the TS DICE. There is a chance that the large number of errors and omission identi- fied by the expert reviewers was due to the fact that the current content of DICE is not based on expert consensus. Instead, the de- velopers of DICE consulted two domain ex- perts when building the TS. It might be that expert review as it was applied in this case study would produce fewer comments if all concepts, terms and relations had been ap- proved, by means of consensus between a larger number of domain experts, before they were included in theTS.This should be considered when generalizing the con- clusions of this case study, regarding the number of errors found by expert review. Looking at the results produced by the three methods we see that the ‘concept matching’methods, that were originally de- signed to evaluate the coverage of concepts, also revealed some missing terms and (non- hierarchical) relations.The formal algorith- mic evaluation revealed only missing and incorrect relations. The expert review re- vealed mainly missing and incorrect re- lations, but also a small number of missing and incorrect concepts and terms. 5. Conclusion Evaluation studies of the content of TS mostly concern ‘term matching’or ‘concept matching’. More sophisticated methods are being explored. Independent of the method used, it remains important to define exactly what is being measured. The definitions provided in this article could be a starting point in this. Based on the results of the case study it seems that expert review is most complete in evaluating the quality of the content of TSs. Expert review produces results for all aspects that determine the quality of the content. However the expert review method has some major drawbacks, of which the most important is the fact that it is very time-consuming. In order to get a good overview of the quality of the content of a TS, it is preferable to use a combination of evaluation methods.The ‘concept match- ing’ method seems to be most useful to de- termine the coverage of the concepts and terms in the TS. However, it is important to consider the source that was used to retrieve the sample of concepts that are being matched to theTS.The intended purpose of the TS should determine the source of the sample. Different sources can lead to differ- ent results regarding the quality of the con- tent of a TS. In addition a clear description of the applied concept matching method is necessary. Formal algorithmic evaluation hasthepotentialtodecreasetheworkloadof human reviewers, but further research is required to explore these potentials. From this study it became clear that each method has its strengths and weaknesses. Therefore the three methods should be used in combi- nation with each other. Acknowledgments We thank the members of the board of the Dutch National Intensive Care Evaluation (NICE) foun- dation and Margreeth Vroom, the head of the depart- ment of intensive care of the Academic Medical Center in Amsterdam, for their cooperation in this study.We thank the Dutch ministry of Health,Welfare and Sports for their financial support. The prelimi- nary results of this study have been presented in a re- duced form in theAMIA 2003 and the Medinfo 2004 proceedings [45, 46]. References 1. De Keizer N, Abu-Hanna A, Zwetsloot-Schonk J. Understanding terminological systems I: ter- minology and typology. Meth Inform Med 2000; 39 (1): 16-21. 2. World Health Organization. International Clas- sification of Diseases, manual of the International Statistical Classification of diseases, injuries and causes of death: 9th revision; 1977. 3. Cad R, Robboy S. Progress in medical in- formation management. Systemized nomencla- ture of medicine (SNOMED). JAMA 1980; 243: 756-62. 4. Cote R, Rothwell D, Palotay J, Beckett R, Brochu L. The systemized nomenclature of human and veterinary medicine: SNOMED International. College of American Pathologists, 1993. 5. North American Nursing Diagnosis Association. NANDA nursing diagnoses: Definitions and classifications, 1999-2000. Philadelphia, PA: NANDA, 1999. 6. De Keizer N,Abu-HannaA, Cornet R, Zwetsloot- Schonk J, Stoutenbeek C. Analysis and design of an ontology for intensive care diagnoses.Methods Inf Med 1999; 38 (2): 102-12. 7. Campbell J, Carpenter P, Sneiderman C, Cohn S, Chute C, Warren J. Phase II evaluation of clinical coding schemes: Completeness, taxonomy, map- ping, definitions, and clarity. J Am Med Inform Assoc 1997; 4 (3): 238-51. 8. Chute C, Cohn S, Campbell J. A framework for comprehensive health terminology systems in the United States. Development guidelines, criteria forselection,andpublicpolicyimplications.JAm Med Inform Assoc 1998; 5 (6): 503-10. 9. Cimino J. Desiderata for controlled medical vo- cabularies in the twenty-first century. Meth In- form Med 2001; 37 (4-5): 394-403. 10. ISO/TC215 WG 3. Standard Specification for Quality Indicators for Controlled Health Vocabu- laries; 2000 July. Report No.: TS17117. 11. Bakken Henry S, Holzemer W, Reilly C, Camp- bell K. Terms used by nurses to describe patient problems: Can SNOMED III represent nursing concepts in the patient record? J Am Med Inform Assoc 1994; 1 (1): 61-74. 12. ChuteC,CohnS,CampbellK,OliverD,Campbell J.The content coverage of clinical classifications. J Am Med Inform Assoc 1996; 3 (3): 224-33. 13. HumphreysB,McCrayA,ChehM.Evaluatingthe coverage of controlled health data terminologies: Report on the results of the NLM/AHCPR large scale vocabulary test. J Am Med Inform Assoc 1997; 4 (6): 484-500. 14. Bodenreider O, BurgunA, Botti G, Fieschi M, Le Beux P, Kohler F. Evaluation of the Unified Medi- cal Language System as a medical knowledge source. J Am Med Inform Assoc 1998; 5 (1): 76-87. 15. Cimino J. Auditing the unified medical language system with semantic methods. JAm Med Inform Assoc 1998; 5 (1): 41-51. 16. Cimino J, McNamara T, Meredith T, Broverman C, Eckert K, Moore M, et al. Evaluation of a pro- posed method for representing drug terminology. In:AMIA1999:ProceedingsoftheAMIAAnnual Symposium; 1999; Washington, DC. Philadel- phia: Hanley & Belfus; 1999. pp 47-51. 17. Bowles K. Application of the Omaha system in acute care. Res Nurs Health 2000; 23: 93-105. 18. Cimino J, Patel V, Kushniruk A. Studying the human-computer-terminology interface. J Am Med Inform Assoc 2001; 8 (2): 163-73. 19. Hardiker N, Rector A. Structural validation of nursing terminologies. J Am Med Inform Assoc 2001; 8 (3): 212-21. 624 Arts et al. Methods Inf Med 5/2005
  • 10. Correspondence to: D. G. T. Arts Academic Medical Center Department of Medical Informatics P.O. Box 22700 1100 DE Amsterdam The Netherlands E-mail: arts@oebig.at 20. Schulz S, Hahn U. Medical knowledge reengi- neering – converting major portions of the UMLS into a terminological knowledge base. Int J Med Inf 2001; 64: 207-21. 21. BodenreiderO.Circularhierarchicalrelationships intheUMLS:etiology,diagnosis,treatment,com- plications and prevention. In: Bakken S, editor. AMIA 2001: Proceedings of the AMIA Annual Symposium; 2001; Washington, USA. Philadel- phia: Hanley & Belfus; 2001. pp 57-61. 22. Bodenreider O, Mitchell J, McCrayA. Evaluation of the UMLS as a terminology and knowledge re- source for biomedical informatics. In: Kohane I, editor.AMIA2002:ProceedingsoftheAMIAAn- nual Symposium; 2002; San Antonio, USA. Phil- adelphia: Hanley & Belfus; 2002. pp 61–65. 23. Bodenreider O, Burgun A, Rindflesch T. Assess- ing the consistency of a biomedical terminology through lexical knowledge. Int J Med Inf 2002; 67: 85-95. 24. Cornet R, Abu-Hanna A. Evaluation of a frame- based ontology. A formalization-orriented ap- proach. In: Engelbrecht R, Surján G, McNair P, editors. MIE2002: Proceedings of Medical Info- bahn Europe; 2002; Budapest. Amsterdam: IOS Press; 2002. pp 488-93. 25. Moss J, CoenenA, Etta Mills M. Evaluation of the draft international standard for reference ter- minology model for nursing actions. J Biomed In- form 2003; 36: 271-8. 26. Hardiker N. Logical ontology for mediating be- tween nursing intervention terminology systems. Methods Inf Med 2003; 42: 265-70. 27. Hardiker N. Determining sources for formal nurs- ing terminology systems. J Biomed Inform 2003; 36: 279-86. 28. Bodenreider O. Strength in numbers: Exploring redundancy in hierarchical relations across bio- medicalterminologies.In:MusenM,FriedmanC, Teich J, editors. AMIA 2003: Proceedings of the AMIA annual symposium; 2003; Washington, USA. Philadelphia: Hanley & Belfus; 2003. pp 101-5. 29. Warnekar P, Carter J. HIV terms coverage by a commercial nomenclature. In: Musen M, Fried- man C, Teich J, editors. AMIA2003: Proceedings oftheAMIAAnnualSymposium;2003;Washing- ton, DC. Philadelphia: Hanley & Belfus; 2003. p 1046. 30. Wasserman H, Wang J. An applied evaluation of SNOMED CT as a clinical vocabulary for the computerized diagnosis and problem list. In: Musen M, Friedman C, Teich J, editors. AMIA2003: Proceedings of the AMIA Annual Symposium; 2003; Washington, DC. Philadel- phia: Hanley & Belfus; 2003. pp 699-703. 31. Brown S,Elkin P,Rosenbloom S,Husser C,Bauer B, Lincoln M, et al. VA national drug reference terminology: A cross-institutional content cover- age study. In: Fieschi M, Coiera E, Li J, editors. Medinfo 2004: Proceedings of the 11th world congress on medical informatics; 2004; San Francisco, USA. Amsterdam: IOS Press; 2004. pp 477-81. 32. Penz J, Brown S, Carter J, Elkin P, NguyenV, Sims S. Evaluation of SNOMED coverage of Veterans HealthAdministration terms. In: Fieschi M, Coie- ra E, Li J, editors. Medinfo 2004: Proceedings of the 11th world congress on medical informatics; 2004; San Francisco, CA.Amsterdam: IOS Press; 2004. pp 540-4. 33. Ceusters W, Smith B, Kumar A, Dhaen C. Ontol- ogy-based error detection in SNOMED-CT. In: Fieschi M, Coiera E, Li J, editors. Medinfo 2004: Proceedings of the 11th world congress on medi- cal informatics; 2004; San Fransisco, USA. Am- sterdam: IOS Press; 2004. pp 482-6. 34. Rothwell D. SNOMED-based knowledge repre- sentation. Meth Inform Med 1995; 34: 209-13. 35. Rector A, Nowlan W. The GALEN project. Com- put Methods Programs Biomed 1994; 45 (1-2): 75-8. 36. Young J, Goldfrad C, Rowan K. Development and testing of a hierarchical method to code the reason for admission to intensive care units: the ICNARC coding method. Br J Anaesth 2001; 87 (4): 543-8. 37. Yeh I, Karp P, Noy N, Altman R. Knowledge ac- quisition, consistency checking and concurrency control for Gene Ontology (GO). Bioinformatics 2003; 19 (2): 241-8. 38. Baader F, Calvanese D, McGuinness DL, Nardi D, Patel-Schneider PF.The Description Logic Hand- book: Theory, Implementation, and Applications. Cambridge: University Press; 2003. 39. Haarslev V, Möller R. RACER System Descrip- tion.In:MassacciF,editor.IJCAR2001:Proceed- ings of the International Joint Conference for Automated Reasoning; 2001; Sienna. Springer- Verlag; 2001. pp 701-6. 40. Cornet R, Abu-Hanna A. Using Description Logics for Managing Medical Terminologies. In: Barahona P, editor. 9th Conference on Artificial Intelligence in Medicine in Europe, AIME; 2003. Protaras, Cyprus: Springer; 2003. pp 61-70. 41. Hales J, Schoeffler K. Barriers to evaluation of clinical vocabularies. In: Cesnik B, editor. Medin- fo 1998: Proceedings of the 5th world congress on medical informatics; 1998, Seoul. Amsterdam: IOS Press; 1998. pp 680-4. 42. Schlobach S, Cornet R. Logical Support for Ter- minologicalModeling.In:FieschiM,CoieraE,Li J, editors. Medinfo 2004: Proceedings of the 11th worldcongressonmedicalinformatics;2004;San Francisco, CA. Amsterdam: IOS Press; 2004. pp 439-43. 43. Schlobach S, Cornet R. Non-Standard Reasoning Services for the Debugging of Description Logic Terminologies. In: Cohn A, editor. IJCAI2003: Proceedings of the International Joint Conference on Artificial Intelligence; 2003; Acapulco, Mexi- co. San Fransisco: Morgan Kaufmann Publishers; 2003. 44. Prins A, Arts D, Cornet R, de Keizer N. Internet- based terminological knowledge maintenance. In: Fieschi M, Coiera E, Li J, editors. Medinfo 2004: Proceedings of the 11th world congress on medical informatics; 2004; San Francisco, CA. Amsterdam: IOS Press; 2004. p 1820. 45. Arts D, Cornet R, De Jonge E, De Keizer N. Com- parison of methods for evaluation of medical ter- minological systems. In: Musen M, Friedman C, Teich J, editors. AMIA2003: Proceedings of the AMIA Annual Symposium; 2003; Washington, DC.Philadelphia:Hanley&Belfus;2003.p779. 46. Arts D, De Keizer N, De Jonge E, Cornet R. Com- parisonofmethodsforevaluationofamedicalter- minological system. In: Fieschi M, Coiera E, Li J, editors. Medinfo 2004: Proceedings of the 11th world congress on medical informatics; 2004; San Francisco, CA.Amsterdam: IOS Press; 2004. pp 467-71. Methods Inf Med 5/2005 625 Methods for Evaluation of Medical Terminological Systems