EFFECTIVE AUTOMATED
CLASSIFICATION USING
ONTOLOGY-BASED ANNOTATION :
EXPERIENCE WITH ANALYSIS OF ADVERSE
EVENT REPORTS
Mélanie Courtot, mcourtot@gmail.com
Current: PhD Candidate, Terry Fox Laboratory, BC Cancer Agency
Starting April 14th 2014: PDF, MBB Dept., Simon Fraser University
(and affiliation with BC Public Health Microbiology and Research
Laboratory).
Background and problem statement
• Surveillance of Adverse Events Following
Immunization is important
•  Detection of issues with vaccine
•  Importance of vaccine-risk communication
• Analysis of AE reports is a subjective, time-
and money costly process
•  Manual review of the textual reports
Hypothesis
Health
Agencies
Data repositories
Other
guideline(s)
Brighton
guideline
AUTOMATIC CASE
CLASSIFICATION
BRIGHTON
ANNOTATIONS
ADVERSE EVENT
REPORTING ONTOLOGY
(AERO)
Clinician
2
INFORMATION
RECALL
SOPs
GENERAL
POPULATION
GUIDELINE
REPRESENTATION
1
DATA INTEGRATION
&
ANSWERING QUERIES
3
Encoding Brighton
guidelines in OWL allows
automated classification of
adverse events at similar
accuracy
Test case
• VAERS dataset
• Vaccine Adverse Event Reporting System
• 6032 reports: ~5800 negative, ~230 positive
• Post H1N1 immunization 2009/2010
• Manually classified for anaphylaxis
• MedDRA (Medical Dictionary of Regulatory
Activities) is used to represent clinical findings
Free text part
of the report
MedDRA encoded
structured data
Example
VAERS report
Automated Diagnosis workflow
ADVERSE EVENT
REPORTING ONTOLOGY
(AERO)
OWL/RDF
EXPORT
VAERS DATASET
MySQL
BRIGHTON
ANNOTATIONS
ASCII files MySQL
~800 MedDRA terms mapped to 32 Brighton terms
REASONER
?
MANUALLY
CURATED
DATASET
A
B
C
D
Results
ADVERSE EVENT
REPORTING ONTOLOGY
(AERO)
OWL/RDF
EXPORT
VAERS DATASET
MySQL
BRIGHTON
ANNOTATIONS
ASCII files MySQL
~800 MedDRA terms mapped to 32 Brighton terms
REASONER
?
MANUALLY
CURATED
DATASET
A
B
C
D
At best cut-off point:
Sensitivity 57%
Specificity 97%
Standardized MedDRA Queries
• SMQs are an existing MedDRA-based screening
method
• Retrieval of documents based on Anaphylaxis
SMQ alone only fair: 54% sensitivity, 97%
specificity
• Idea:
•  Identify MedDRA terms that are significantly associated
with the diagnosis outcome using contingency tables
•  Augment the existing MedDRA SMQ with those terms
Cosine similarity method
•  Represent documents (query and report) as vectors of
terms
•  Compare the cosine measure of the angle they form
Cosine ~ 1
Query ~ Report
Cosine ~ 0
Query != Report
Example
•  Vector MEDDRA SMQ:
’Choking', 'Cough’, ’Oedema’, 'Rash’
•  Vector REPORT#72:
’Oedema’, 'Rash’, ‘Vomiting’
•  Vector REPORT#104:
‘Palpitations’, ‘Fatigue’, Neuropathy’
Results - Expanded MedDRA SMQ
At best cut-off point: Sensitivity 92%, Specificity 87%
Discussion
•  Using the ontology, the sensitivity is too low for efficient
screening
•  Brighton guidelines are not meant for screening, but for
diagnosis confirmation
•  We improved on the screening result and reached 92%
sensitivity, 87% specificity.
•  Using both approaches concurrently yields best screening
results
Key outcomes
•  Current encoding standards don’t allow for complete
representation of events
•  e.g., missing temporality descriptors (sudden onset, rapid
progression)
•  Critical for diagnosis confirmation and causality assessment
•  Information lacking in reports form surveillance
systems
•  Not assessed? Not recorded? Negative?
•  Logical translation of guidelines allows for better
detection of inconsistencies and errors
•  We are working with the Brighton Collaboration towards adding a
logical formalization to the existing case definitions
Use of the ontology for reporting
• In current systems:
•  Fast screening -> fast detection of potentially positive
reports
•  Reporter can be sent a more detailed report, e.g.
“Brighton-based anaphylaxis report form”
• In future systems:
•  Implementation of the ontology-based system at the
time of data entry
•  Provides labels and textual definitions for each term
•  Enable consistency checking
Next steps: IRIDA project
•  Integrated Rapid Infectious Disease Analysis
•  http://www.irida.ca
•  IRIDA is a bioinformatics platform for genomic
epidemiology analysis to improve outbreak surveillance
and detection
•  Collaboration between academia and public health
•  Ontologies will be developed to annotate clinical, lab and
epidemiology data, and integrate for further analysis
Acknowledgements
•  Ryan Brinkman, BC Cancer Agency, Vancouver, Canada
•  Alan Ruttenberg, University at Buffalo, New York, USA
•  Julie Lafleche, Robert Pless, Barbara Law, Public
Health Agency of Canada, Ottawa, Ontario
•  Jan Bonhoeffer, Brighton Collaboration, Basel,
Switzerland
•  IRIDA project: Fiona Brinkman, William Hsiao

Biocuration 2014 - Effective automated classification of adverse events using ontology-based annotations

  • 1.
    EFFECTIVE AUTOMATED CLASSIFICATION USING ONTOLOGY-BASEDANNOTATION : EXPERIENCE WITH ANALYSIS OF ADVERSE EVENT REPORTS Mélanie Courtot, mcourtot@gmail.com Current: PhD Candidate, Terry Fox Laboratory, BC Cancer Agency Starting April 14th 2014: PDF, MBB Dept., Simon Fraser University (and affiliation with BC Public Health Microbiology and Research Laboratory).
  • 2.
    Background and problemstatement • Surveillance of Adverse Events Following Immunization is important •  Detection of issues with vaccine •  Importance of vaccine-risk communication • Analysis of AE reports is a subjective, time- and money costly process •  Manual review of the textual reports
  • 3.
    Hypothesis Health Agencies Data repositories Other guideline(s) Brighton guideline AUTOMATIC CASE CLASSIFICATION BRIGHTON ANNOTATIONS ADVERSEEVENT REPORTING ONTOLOGY (AERO) Clinician 2 INFORMATION RECALL SOPs GENERAL POPULATION GUIDELINE REPRESENTATION 1 DATA INTEGRATION & ANSWERING QUERIES 3 Encoding Brighton guidelines in OWL allows automated classification of adverse events at similar accuracy
  • 4.
    Test case • VAERS dataset • VaccineAdverse Event Reporting System • 6032 reports: ~5800 negative, ~230 positive • Post H1N1 immunization 2009/2010 • Manually classified for anaphylaxis • MedDRA (Medical Dictionary of Regulatory Activities) is used to represent clinical findings
  • 5.
    Free text part ofthe report MedDRA encoded structured data Example VAERS report
  • 6.
    Automated Diagnosis workflow ADVERSEEVENT REPORTING ONTOLOGY (AERO) OWL/RDF EXPORT VAERS DATASET MySQL BRIGHTON ANNOTATIONS ASCII files MySQL ~800 MedDRA terms mapped to 32 Brighton terms REASONER ? MANUALLY CURATED DATASET A B C D
  • 7.
    Results ADVERSE EVENT REPORTING ONTOLOGY (AERO) OWL/RDF EXPORT VAERSDATASET MySQL BRIGHTON ANNOTATIONS ASCII files MySQL ~800 MedDRA terms mapped to 32 Brighton terms REASONER ? MANUALLY CURATED DATASET A B C D At best cut-off point: Sensitivity 57% Specificity 97%
  • 8.
    Standardized MedDRA Queries • SMQsare an existing MedDRA-based screening method • Retrieval of documents based on Anaphylaxis SMQ alone only fair: 54% sensitivity, 97% specificity • Idea: •  Identify MedDRA terms that are significantly associated with the diagnosis outcome using contingency tables •  Augment the existing MedDRA SMQ with those terms
  • 9.
    Cosine similarity method • Represent documents (query and report) as vectors of terms •  Compare the cosine measure of the angle they form Cosine ~ 1 Query ~ Report Cosine ~ 0 Query != Report
  • 10.
    Example •  Vector MEDDRASMQ: ’Choking', 'Cough’, ’Oedema’, 'Rash’ •  Vector REPORT#72: ’Oedema’, 'Rash’, ‘Vomiting’ •  Vector REPORT#104: ‘Palpitations’, ‘Fatigue’, Neuropathy’
  • 11.
    Results - ExpandedMedDRA SMQ At best cut-off point: Sensitivity 92%, Specificity 87%
  • 12.
    Discussion •  Using theontology, the sensitivity is too low for efficient screening •  Brighton guidelines are not meant for screening, but for diagnosis confirmation •  We improved on the screening result and reached 92% sensitivity, 87% specificity. •  Using both approaches concurrently yields best screening results
  • 13.
    Key outcomes •  Currentencoding standards don’t allow for complete representation of events •  e.g., missing temporality descriptors (sudden onset, rapid progression) •  Critical for diagnosis confirmation and causality assessment •  Information lacking in reports form surveillance systems •  Not assessed? Not recorded? Negative? •  Logical translation of guidelines allows for better detection of inconsistencies and errors •  We are working with the Brighton Collaboration towards adding a logical formalization to the existing case definitions
  • 14.
    Use of theontology for reporting • In current systems: •  Fast screening -> fast detection of potentially positive reports •  Reporter can be sent a more detailed report, e.g. “Brighton-based anaphylaxis report form” • In future systems: •  Implementation of the ontology-based system at the time of data entry •  Provides labels and textual definitions for each term •  Enable consistency checking
  • 15.
    Next steps: IRIDAproject •  Integrated Rapid Infectious Disease Analysis •  http://www.irida.ca •  IRIDA is a bioinformatics platform for genomic epidemiology analysis to improve outbreak surveillance and detection •  Collaboration between academia and public health •  Ontologies will be developed to annotate clinical, lab and epidemiology data, and integrate for further analysis
  • 16.
    Acknowledgements •  Ryan Brinkman,BC Cancer Agency, Vancouver, Canada •  Alan Ruttenberg, University at Buffalo, New York, USA •  Julie Lafleche, Robert Pless, Barbara Law, Public Health Agency of Canada, Ottawa, Ontario •  Jan Bonhoeffer, Brighton Collaboration, Basel, Switzerland •  IRIDA project: Fiona Brinkman, William Hsiao