3. The task: Medication Extraction
Given Other
Discharge reports Event
Wanted Temporal
Medication mention Certainty
Dose
Mode of application
Frequency
Duration
Reason
List/narrative
5. Regulations/requirements
Medical requirements
Drug taken by patient
No allergies
No food, water, diet, tobacco, alcohol, illicit drugs
Linguistic requirements
the most informative base adjective phrase or the
longest base noun phrase as reason
6. Required output
Event-based annotation
Repeat individual mention for each event
“Aspirin for headache and for leg pain”
Aspirin … headache
Aspirin … leg pain
Semantic-level expectations
NITROGLYCERIN 1/150 ( 0.4 MG ) 1 TAB SL q5min
x 3
7. Training and test data
Ground Truth, 27 records
Manually annotated by “PG students”
Scrutinised by the community
Relative f-score: ~60%
Unannotated training data: 620
Test data: 260
9. Preprocessing
Split sentences
A sentence and paragraph breaker
NaCTeM: sptoolkit.jar
POS tagging
A part-of-speech tagger for English
Tsujii: postagger
Parsing (chunking)
CFG parser
Tsujii: chunkparser
10. Rules
Medication Dictionary (> 1000)
Morphological: medication affix (> 100)
-bicine, -caine, etc.
Precedes a mode
Inhaler, supplement, etc.
Medication type
Cardiac, cardiovascular (~100)
Symptoms (~100)
Chest discomfort, etc.
11. Word lists and regular expressions
Dosage, mode, frequency
Duration (While, for, etc.)
Reason
Head
Diseases
Symptoms (pain, agitation, etc.) ~20
Inffixes (hyper-, -emia, etc.)
Modifier (acute, chronic, etc.) <100
Time phrases, Body parts
12. Producing output
Remove allergies
Remove laboratory results
Merge labels
<m>INSULIN</m> <m>GLARGINE</m>
<f>after dialysis</f> on <f>Monday</f>
<f>Wednesday</f><f>Friday</f>
Remove negated medications
“patient instructed not to take Viagra.”
etc.
13. Evaluation process
Small training data (27)
Organisers
Community
Gold standard test data (260)
Annotated by participants
Merge and tie-break
Community
Silver data (620)
Voting
14. Evaluation on ground truth
inexact horizontal systemlevel X 0.8776
inexact horizontal patientlevel X 0.8928
inexact vertical systemlevel do 0.9150
inexact vertical patientlevel do 0.9160
inexact vertical systemlevel f 0.9172
inexact vertical patientlevel f 0.9197
inexact vertical systemlevel mo 0.9441
inexact vertical patientlevel mo 0.9471
inexact vertical systemlevel m 0.9544
inexact vertical patientlevel m 0.9519
inexact vertical systemlevel r 0.5260
inexact vertical patientlevel r 0.3876
inexact vertical systemlevel du 0.7958
inexact vertical patientlevel du 0.5846
15. Preliminary evaluation on test data
inexact horizontal systemlevel X 0.7847
inexact horizontal patientlevel X 0.7755
inexact vertical systemlevel do 0.8267
inexact vertical patientlevel do 0.8155
inexact vertical systemlevel f 0.8349
inexact vertical patientlevel f 0.8289
inexact vertical systemlevel mo 0.8359
inexact vertical patientlevel mo 0.8256
inexact vertical systemlevel m 0.8533
inexact vertical patientlevel m 0.8541
inexact vertical systemlevel r 0.3881
inexact vertical patientlevel r 0.3883
inexact vertical systemlevel du 0.51
inexact vertical patientlevel du 0.4969