Annotating Evidence
why, what, how
Lessons learnt
Alexander Garcia, PhD & Deirdre Beecher, MSc
**Some information on the slides has been taken from the following source:
Alexander Garcia Castro Developing ontologies in the biological domain
A thesis submitted for the degree of Doctor of Philosophy at The University of Queensland,
Institute for Molecular Bioscience (September 2007)
Why are we annotating?
Moving from Static to Dynamic
We want to…
• To facilitate asking questions to the Cochrane library
• In simple terms, using vocabularies and relations researchers and clinicians understand
• What treatments are effective for acute alcohol withdrawal syndrome (AWS) not
adequately controlled with benzodiazepines?
• To do content enrichment
• By identifying and relating concepts to existing ontologies
• By identifying and structuring evidence
• To contribute to sematic interoperability in health care and life sciences
• We are exposing evidence that machines can process
• Linking Cochrane evidence to external sources– and vice versa
• Our technology Linked Data
• Third parties will be able to easily consume our content
What are we annotating?
From the whole to the part
PICO
• P Patient, Population
• How would I describe a group of
patients similar to mine?
• I Intervention
• Which is the main intervention,
prognostic factor, or exposure am
I considering?
• C Comparison
• What is the main alternative to
compare with the intervention?
• O Outcome
• What would you like to measure
or achieve. What can I hope to
accomplish, measure, improve, or
affect?
What are we annotating?
Intervention and
Comparison
PICO
Population
Outcome
Condition
Sex
Age
Male, female, or both
MeSH category
Materials
Procedure
How often
How much
How long
OHDSI vocabulary (selected)
OHDSI vocabulary (selected)
OHDSI vocabulary (selected)
Outcome
Type
COMET classification
Schedule
Dose
Duration
Unit value
Unit
Unit value
Unit
Unit value
Unit
Free text
Type
Davey et al classification
Pregnant, aged
between 16 and 28 with
preeclampsia
Dose regimes for
administration of
Magnesium sulfate
What outcomes are there
For this comparison?
Steroids vs what? Is
better for baby’s
lungs?
WHO-ATC
RxNORM
SNOMED-MEDRA
SNOMED-MEDRA
SNOMED-MEDRA
DOI: 10.2353/jmoldx.2009.090037 CITATION-ID:
39306481 JOURNAL-TITLE: The Journal of Molecular
Diagnostics JOURNAL-CITE-ID: 58207 BOOK-CITE-ID: SERIES-
ID: DEPOSIT-TIMESTAMP: 20110701073812000 OWNER:
10.1016 LAST-UPDATE: 2011-07-04 21:13:59 PRIME-DOI: none
Metadata
Indirectly about the content
Describes a file
Is this enough for answering a
question?
The Semantic Web
The Semantic Web is an
extension of the current
web in which information is
given well-defined meaning,
better enabling computers
and people to work in co-
operation.
diagnosis of chronic asthma
children and adults
BDP versus BUD delivered by oral inhalation
Oropharyngeal side effects (hoarseness, sore throat, oral
Candidiasis)
In vitro studies assessing
pharmacodynamic properties have
shown differences between BDP and
BUD
Male and Female, Child, Preschool 2-5 years and Child
6-12 years and Adolescent 13-18 years and Young
Adult 19-24 years and Adult 19-44 years and Middle
Aged 45-64 years and Aged 65-79 years and Aged, 80
and over 80+ years: Asthma;
“Studies of both children and adults were
included, but patients under two years of age
were excluded. To be eligible, participants had
to have a diagnosis of chronic asthma. Studies
conducted in both primary and secondary care
settings were considered.”
Interventions:
[Pharmacological] Beclometasone
Beclomethasone dipropionate
versus budesonide delivered by oral inhalation
Efficacy related
Clinic measured FEV1 and PEF, diary card morning and
evening PEF, diurnal variability in PEF
Safety related
Hypothalamo-pituitary-adrenal (HPA) axis function
reflected in serum and urinary cortisol measures and
clinical adrenal insufficiency
P
I
C
O
Each drug had to be delivered at the same
nominal daily dose. Nominal dose was
calculated as the valve dose multiplied by the
number of actuations per day….
BDP versus BUD delivered by oral inhalation
Physiological or clinical - lung function
Resource use - Asthma exacerbations:
hospital admission, emergency room
attendance, unscheduled primary care
visits, days off school or work
- Experimental intervention
Benzodiazepines alone or in combination
with other drugs
- Control Intervention
Placebo; Other pharmacological
interventions
[No active intervention]
Placebos:
Physiological or clinical - Alcohol
Withdrawal Syndrome;
Alcohol dependent patients diagnosed in
accordance with appropriate standardized
criteria (e.g., criteria of Diagnostic and Statistical
Manual of Mental Disorders (DSM-IV-R) or
ICD)…NOT mental Disability AND condition
Alcoholism
What treatments are
effective for acute alcohol
withdrawal syndrome (AWS)
not adequately controlled
with benzodiazepines?
Annotations and Evidence
Finding the evidence
From Concepts to Web-Knowledge
concept = ”An abstract entity signifying a general characterizing idea or universal which acts as a category for instances. The unit
of semantics (meaning), the node in some mental or knowledge organization system.”
Diclofenac
A non-steroidal anti-inflammatory agent (NSAID) with
antipyretic and analgesic actions. It is primarily
available as the sodium salt. [PubChem]
has evidence for reactions Has adverse effects
Diclofenac-K sachet 50 mg, n =
291 DrugBank
Has approved prescription
products
Adverse reactions to nonsteroidal anti-inflammatory drugs.
Diclofenac compared with other nonsteroidal anti-
inflammatory drugs. PMC
nausea, upper stomach
pain, itching, loss of
appetite, dark urine, clay-
colored stools, jaundice
(yellowing of the skin or
eyes) TGA
Often confused with CVs
Ontologies are constantly evolving
Ontologies facilitate classification, inference, reasoning about data
Representation of the reality
They are re related to data, represent data, entities from the reality
May be built from de novo, very often they are built by reusing existing resources
14
Ontology Technology
• “Ontology” covers a range of things
• Controlled vocabularies – e.g. MeSH
• Linguistic structures – e.g. WordNet
• Hierarchies (with bells and whistles) – e.g. Gene Ontology
• Frame representations – e.g. FMA
• Description logic formalisms – Snomed-CT, GALEN, OWL-DL based
ontologies
• Philosophically inspired e.g. Ontoclean and SUMO
15
Description Logics
• What the logicians made of Frames
• Greater expressivity and semantic precision
• Compositional definitions
• “Conceptual Lego” – define new concepts from old
• To allow automatic classification & consistency checking
• The mathematics of classification is tricky
• Some seriously counter-intuitive results
• The basics are simple – devil in the detail
16
A simple ontology: Animals
Living Thing
Grass
Animal
Plant
Tree
Body Part
Arm
Leg
Person
Cow
Carnivore
Herbivore
eats
eats
eats
has part
What is an Ontology?
“An ontology is a formal, explicit specification of a shared
conceptualization”
• Machine readable
• Concepts, properties relations, functions, constraints, axioms, are explicitly
defined
• Consensual Knowledge
• Abstract model and simplified view of some phenomenon in the world that
we want to represent
MeDRA
SNOMED
WHORxNorm
COCHARNE
Knowledge
Graph
WHO-ATC drugs.
Mostly a hierarchical
classification built
upon parent-child
relations
It provides
normalized names
for clinical drugs and
links its names to
many of the drug
vocabularies
commonly used in
pharmacy software
Medical Dictionary for Regulatory
Activities. Clinically validated
international medical terminology
dictionary (and thesaurus) used by
regulatory authorities in the
pharmaceutical
SNOMED CT (Systematized Nomenclature of Medicine --
Clinical Terms) is a standardized, multilingual vocabulary
of clinical terminology that is used by physicians and
other health care providers for the electronic exchange of
clinical health information
Interventions
Interventions
Conditions
Conditions
Population
Population
Comparisons
Comparisons
ATC Main Group(14)
A- Alimentary Tract And Metabolism
B-Blood And Blood Forming Organs
C- Cardiovascular System
D- Dermatologicals
G -Genito Urinary System And Sex Hormones
H -Systemic Hormonal Preparations, Excl. Sex Hormones
And Insulins
J- Antiinfectives For Systemic Use
L- Antineoplastic And Immunomodulating Agents
M- Musculo-skeletal System
N- Nervous System
P- AntiParasitic Products, Insecticides And Repellents
R- Respiratory System
S- Sensory Organs
V- Various
The active substances are
divided into different groups
according to the organ or
system on which they act and
their therapeutic,
pharmacological and chemical
properties.
SNOMED
CORE
• Concepts
– Concept Ids – meaningless
machine-processable numbers
• Descriptions (Terms)
– Human-processable terms
• Relationships
– Between concepts: source to
destination
Terms
• Fully Specified Name (FSN)
• Preferred Term (PT)
• Acceptable (Terms) – synonyms
ID Title Part of speech Field Annotation
CD000013 Amnioinfusionfor
potentialor
suspectedumbilical
cordcompressionin
labour
“Women whose
babies were
considered to be at
increased risk of, or
had FHR patterns
suggestive of,
umbilical cord
compression in
labour.”
Sex Female
Age range
Adolescent 13-18 years
AND Young Adult 19-
24 years AND Adult 19-
44 years
Condition
Labor Finding AND
Compression Of
Umbilical Cord
When the sex is NOT specified in the review methods, assume male & female.
If the age range is NOT specified (for instance, CD000013), choose adolescent
plus adult.
About condition. The annotation of clinical conditions some times could be
difficult due to two factors: i) the condition name is not directly mentioned into
the document, or ii) identical terms, for a condition, are represented in several
vocabularies. Below are presented two scenarios representing both issues.
Annotating P
What if the condition name is not directly mentioned
“measuring the
percentage of
abnormal
spermatozoids”
And
“shape of the
spermatozoid”
are referring to the
condition
“teratozoospermia”
A.K.A
“Teratospermia”;
this condition may be
characterized by
“teratozoospermia”,
snomed ID (SCTID:
236817003)
or it could also be
characterized by
“Sperm Morphology – No
normal forms”, SNOMED
ID (SCTID: 167796003).
Initially, the internal PICO browser and the vocab browser
should be used. This will make it possible for the
metadata specialist to inspect the vocabularies loaded
into the tool; siblings, broader and narrower terms will be
displayed. For instance, for “teratozoospermia”, siblings
such as “sperm morphology – no normal forms” will be
displayed.
What if there are multiple identical terms displayed, or if it is
impossible to know which term would be best to use
“umbilical cord
compression” is
available in two
vocabularies: SNOMED-
CT and MedDra.
Vocab. Hierarchy
SNOMED-CT Complication
Complication of pregnancy, childbirth and/or the puerperium
Umbilical cord complication
Compression of umbilical cord
MedDra Pregnancy, puerperium and perinatal conditions
Neonatal and perinatal conditions
Umbilical cord complications
Umbilical cord compression
According to the hierarchy, both terms are
suitable to be used. In this case, the metadata
specialist choses the SNOMED-CT term. WHY?
Annotating the Intervention
Gentamicin
• D06AX07 gentamicin
• J01GB03 gentamicin
• S01AA11 gentamicin
• S02AA14 gentamicin
• S03AA06 gentamicin
Which one should I use?
• Is is a dermatological
intervention?
• Is is otological?
• Is it ophthalmological?
• Is this about a systemic use of
the antibiotic?