2. Investigators
(in alphabetical order)
Childrens Hospital Boston and HMS (site PI: Guergana
Savova)
MIT (site PI: Peter Szolovits)
MITRE corporation (site PI: Lynette Hirschman)
Seattle Group Health (site PI: David Carrell)
SUNY Albany (site PI: Ozlem Uzuner)
University of California, San Diego (site PI: Wendy
Chapman
University of Colorado (site PI: Martha Palmer)
University of Pittsburg (site PI: Henk Harkema)
University of Utah and Intermountain Healthcare (site
PI: Peter Haug)
3. Special Acknowledgement
Our talented super software developers
– Vinod Kaggal, lead
– Dingcheng Li
– Pei Chen
– James Masanz
4. Overview
Part 1:
– Background and objectives of SHARP 4 cNLP
project
– Year 1 achievements
– Clinical Text Analysis and Knowledge
Extraction System (cTAKES)
– Year 2 proposed projects
– Graphical User Interface to cTAKES: demo
Part 2:
– cTAKES: demo
5. Aims
Information extraction (IE): transformation of
unstructured text into structured representations
and merging clinical data extracted from free text
with structured data
– Entity and Event discovery
– Relation discovery
– Normalization template: Clinical Element Model (CEM)
Overarching goal
– high-throughput phenotype extraction from clinical free
text based on standards and the principles of
interoperability
– general purpose clinical NLP tool with applications to the
majority of all imaginable use cases
6. A 43-year-old woman was
diagnosed with type 2 diabetes
mellitus by her family physician 3
mpresentation. Her initial blood
glucose was 340 mg/dL.
Glyburide
A 43-year-old woman was
diagnosed with type 2
diabetes mellitus by her
family physician 3 months
before this presentation.
Her initial blood glucose
was 340 mg/dL. Glyburide
A 43-year-old woman was diagnosed
with type 2 diabetes mellitus by her
family physician 3 months before
this presentation. Her initial blood
glucose was 340 mg/dL. Glyburide
A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her
family physician 3 months before this presentation. Her initial blood
glucose was 340 mg/dL. Glyburide 2.5 mg once daily was prescribed. Since
then, self-monitoring of blood glucose (SMBG) showed blood glucose levels
of 250-270 mg/dL. She was referred to an endocrinologist for further
evaluation.
On examination, she was normotensive and not acutely ill. Her body mass
index (BMI) was 18.7 kg/m2 following a recent 10 lb weight loss. Her
thyroid was symmetrically enlarged and ankle reflexes absent. Her blood
glucose was 272 mg/dL, and her hemoglobin A1c (HbA1c) was 10.3%. A
lipid profile showed a total cholesterol of 261 mg/dL, triglyceride level of
321 mg/dL, HDL level of 48 mg/dL, and an LDL of 150 mg/dL. Thyroid
function was normal. Urinanalysis showed trace ketones.
She adhered to a regular exercise program and vitamin regimen, smoked 2
packs of cigarettes daily for the past 25 years, and limited her alcohol
intake to 1 drink daily. Her mother's brother was diabetic.
Processing Clinical Notes
A 43-year-old woman was diagnosed with type 2 diabetes
mellitus by her family physician 3 months before this
presentation. Her initial blood glucose was 340 mg/dL.
Glyburide 2.5 mg once daily was prescribed. Since then,
self-monitoring of blood glucose (SMBG) showed blood
glucose levels of 250-270 mg/dL. She was referred to an
endocrinologist for further evaluation.
On examination, she was normotensive and not acutely
ill. Her body mass index (BMI) was 18.7 kg/m2 following
a recent 10 lb weight loss. Her thyroid was
symmetrically enlarged and ankle reflexes absent. Her
blood glucose was 272 mg/dL, and her hemoglobin A1c
(HbA1c) was 10.3%. A lipid profile showed a total
cholesterol of 261 mg/dL, triglyceride level of 321
mg/dL, HDL level of 48 mg/dL, and an LDL of 150 mg/dL.
Thyroid function was normal. Urinanalysis showed trace
ketones.
She adhered to a regular exercise program and vitamin
regimen, smoked 2 packs of cigarettes daily for the
past 25 years, and limited her alcohol intake to 1
drink daily. Her mother's brother was diabetic.
7. Clinical Element Model
Disorder CEM
text: diabetes mellitus
code: 73211009
subject: patient
relative temporal context: 3 months ago
negation indicator: not negated
Disorder CEM
text: diabetes mellitus
code: 73211009
subject: family member
relative temporal context:
negation indicator: not negated
Tobacco Use CEM
text: smoking
code: 365981007
subject: patient
relative temporal context: 25 years
negation indicator: not negated
Medication CEM
text: Glyburide
code: 315989
subject: patient
frequency: once daily
negation indicator: not negated
strength: 2.5 mg
A 43-year-old woman was diagnosed with type 2 diabetes
mellitus by her family physician 3 months before this
presentation. Her initial blood glucose was 340 mg/dL.
Glyburide 2.5 mg once daily was prescribed. Since then,
self-monitoring of blood glucose (SMBG) showed blood
glucose levels of 250-270 mg/dL. She was referred to an
endocrinologist for further evaluation.
On examination, she was normotensive and not acutely
ill. Her body mass index (BMI) was 18.7 kg/m2 following
a recent 10 lb weight loss. Her thyroid was
symmetrically enlarged and ankle reflexes absent. Her
blood glucose was 272 mg/dL, and her hemoglobin A1c
(HbA1c) was 10.3%. A lipid profile showed a total
cholesterol of 261 mg/dL, triglyceride level of 321
mg/dL, HDL level of 48 mg/dL, and an LDL of 150 mg/dL.
Thyroid function was normal. Urinanalysis showed trace
ketones.
She adhered to a regular exercise program and vitamin
regimen, smoked 2 packs of cigarettes daily for the
past 25 years, and limited her alcohol intake to 1
drink daily. Her mother's brother was diabetic.
A 43-year-old woman was diagnosed with type 2 diabetes
mellitus by her family physician 3 months before this
presentation. Her initial blood glucose was 340 mg/dL.
Glyburide 2.5 mg once daily was prescribed. Since then,
self-monitoring of blood glucose (SMBG) showed blood
glucose levels of 250-270 mg/dL. She was referred to an
endocrinologist for further evaluation.
On examination, she was normotensive and not acutely
ill. Her body mass index (BMI) was 18.7 kg/m2 following
a recent 10 lb weight loss. Her thyroid was
symmetrically enlarged and ankle reflexes absent. Her
blood glucose was 272 mg/dL, and her hemoglobin A1c
(HbA1c) was 10.3%. A lipid profile showed a total
cholesterol of 261 mg/dL, triglyceride level of 321
mg/dL, HDL level of 48 mg/dL, and an LDL of 150 mg/dL.
Thyroid function was normal. Urinanalysis showed trace
ketones.
She adhered to a regular exercise program and vitamin
regimen, smoked 2 packs of cigarettes daily for the
past 25 years, and limited her alcohol intake to 1
drink daily. Her mother's brother was diabetic.
A 43-year-old woman was diagnosed with type 2 diabetes
mellitus by her family physician 3 months before this
presentation. Her initial blood glucose was 340 mg/dL.
Glyburide 2.5 mg once daily was prescribed. Since then,
self-monitoring of blood glucose (SMBG) showed blood
glucose levels of 250-270 mg/dL. She was referred to an
endocrinologist for further evaluation.
On examination, she was normotensive and not acutely
ill. Her body mass index (BMI) was 18.7 kg/m2 following
a recent 10 lb weight loss. Her thyroid was
symmetrically enlarged and ankle reflexes absent. Her
blood glucose was 272 mg/dL, and her hemoglobin A1c
(HbA1c) was 10.3%. A lipid profile showed a total
cholesterol of 261 mg/dL, triglyceride level of 321
mg/dL, HDL level of 48 mg/dL, and an LDL of 150 mg/dL.
Thyroid function was normal. Urinanalysis showed trace
ketones.
She adhered to a regular exercise program and vitamin
regimen, smoked 2 packs of cigarettes daily for the
past 25 years, and limited her alcohol intake to 1
drink daily. Her mother's brother was diabetic.
A 43-year-old woman was diagnosed with type 2 diabetes
mellitus by her family physician 3 months before this
presentation. Her initial blood glucose was 340 mg/dL.
Glyburide 2.5 mg once daily was prescribed. Since then,
self-monitoring of blood glucose (SMBG) showed blood
glucose levels of 250-270 mg/dL. She was referred to an
endocrinologist for further evaluation.
On examination, she was normotensive and not acutely
ill. Her body mass index (BMI) was 18.7 kg/m2 following
a recent 10 lb weight loss. Her thyroid was
symmetrically enlarged and ankle reflexes absent. Her
blood glucose was 272 mg/dL, and her hemoglobin A1c
(HbA1c) was 10.3%. A lipid profile showed a total
cholesterol of 261 mg/dL, triglyceride level of 321
mg/dL, HDL level of 48 mg/dL, and an LDL of 150 mg/dL.
Thyroid function was normal. Urinanalysis showed trace
ketones.
She adhered to a regular exercise program and vitamin
regimen, smoked 2 packs of cigarettes daily for the
past 25 years, and limited her alcohol intake to 1
drink daily. Her mother's brother was diabetic.
8. Comparative Effectiveness
Disorder CEM
text: diabetes mellitus
code: 73211009
subject: patient
relative temporal context: 3 months ago
negation indicator: not negated
Disorder CEM
text: diabetes mellitus
code: 73211009
subject: family member
relative temporal context:
negation indicator: not negated
Tobacco Use CEM
text: smoking
code: 365981007
subject: patient
relative temporal context: 25 years
negation indicator: not negated
Medication CEM
text: Glyburide
code: 315989
subject: patient
frequency: once daily
negation indicator: not negated
strength: 2.5 mg
Compare the effectiveness of different treatment
strategies (e.g., modifying target levels for glucose,
lipid, or blood pressure) in reducing cardiovascular
complications in newly diagnosed adolescents and
adults with type 2 diabetes.
Compare the effectiveness of traditional behavioral
interventions versus economic incentives in
motivating behavior changes (e.g., weight loss,
smoking cessation, avoiding alcohol and substance
abuse) in children and adults.
9. Meaningful Use
Disorder CEM
text: diabetes mellitus
code: 73211009
subject: patient
relative temporal context: 3 months ago
negation indicator: not negated
Disorder CEM
text: diabetes mellitus
code: 73211009
subject: family member
relative temporal context:
negation indicator: not negated
Tobacco Use CEM
text: smoking
code: 365981007
subject: patient
relative temporal context: 25 years
negation indicator: not negated
Medication CEM
text: Glyburide
code: 315989
subject: patient
frequency: once daily
negation indicator: not negated
strength: 2.5 mg
• Maintain problem list
• Maintain active med list
• Record smoking status
• Provide clinical summaries for each office visit
• Generate patient lists for specific conditions
• Submit syndromic surveillance data
10. Clinical Practice
Disorder CEM
text: diabetes mellitus
code: 73211009
subject: patient
relative temporal context: 3 months ago
negation indicator: not negated
Medication CEM
text: Glyburide
code: 315989
subject: patient
frequency: once daily
negation indicator: not negated
strength: 2.5 mg
• Provide problem list and meds from the visit
11. Applications
Meaningful use of the EMR
Comparative effectiveness
Clinical investigation
– Patient cohort identification
– Phenotype extraction
Epidemiology
Clinical practice
…..
12. How does NLP fit?
Demo pipeline, v1
– All medications in Mayo dataset extracted
with cTAKES (NLP method)
– Processed 360,452 notes for 10,000 patients
– 3,442,000 CEMs were created
– Processing time: 1.6 sec/doc
14. Y1 Technical and Scientific
Activities
Gold standard corpus development:
– corpus creation methodology
– de-id and PHI surrogate generation tools
– seed corpus generation (PAD, pneumonia, breast cancer)
– annotation schema development based on CEM normalization target
– annotation guidelines and pilot annotations
– gold standard annotations are in progress
Type System for software development
Development of Evaluation workbench
Methods development
– entity and event discovery
– relation discovery
15. Y1 Software Deliverables
(cTAKES modules)
JUL AUG SEP OCT NOV DEC JAN FEB MAR APR MAY JUN
2010 2011
Dependency Parser
Drug Profile Module
Smoking Status
Classifier
CEM ‘orderMedAmb’
Population
Full-Cycle
Pipeline v1
16. SHARP Security Roundtable
for Cloud-Deployed cNLP
May 23-24, 2011
Participants: SHARP 1, SHARP 4, health care
organizations, the Veterans Administration, industry, and
other research institutions
Providing guidance to institutions seeking to use cloud
technologies to support development and application of
cNLP tools
A set of recommendations for the novel legal and
governance issues regarding the proper stewardship and use
of clinical data
17. SHARP Collaborations
SHARP 1:
– Around security in a cloud computing
environment
SHARP 3 (SMaRT):
– Around extraction of data from the clinical
narrative
– I2b2 database for data persistence?
18. Partnerships
NCBC-funded initiatives
– Integrating Informatics and Biology to the Bedside (i2b2)
– Integrating Data for Analysis, Anonymization and Sharing (iDASH)
– Ontology Development and Information Extraction (ODIE)
Veterans Administration
R01s
– Shared annotated lexical resource
– Temporal relation discovery for the clinical domain
– Milti-source integrated platform for answering clinical questions
University of York (UK), University of Trento (Italy),
Brandeis University (USA)
eMERGE, PGRN (Pharmacogenomics Research Network)
21. Overview
• Goal:
• Phenotype extraction
• Generic – to be used for a variety of retrievals and use cases
• Expandable – at the information model level and methods
• Modular
• Cutting edge technologies – best methods combining existing
practices and novel research with rapid technology transfer
• Terminology agnostic: able to plug in any terminology
• Best software practices (80M+ notes)
• Stand-alone tool easily pluggable within other platforms/toolsets
• Apache v2.0 license
• http://sourceforge.net/projects/ohnlp/
• Commitment to both R and D in R&D
28. Agent
Loc
the patient will complete his thiotepa dose today , and he will return
tomorrow for the last dose of his thiotepa .
His donor completed stem-cell collection yesterday
The patient returns to the outpatient clinic today for follow-up
Courtesy of Martha Palmer
29. Agent
Agent
Loc Theme
, and he will return
tomorrow for the last dose of his thiotepa .
His donor completed stem-cell collection yesterday
The patient returns to the outpatient clinic today for follow-up
the patient will complete his thiotepa dose today
Courtesy of Martha Palmer
30. Agent
Agent
Loc Theme
Agent
Purpose
His donor completed stem-cell collection yesterday
The patient returns to the outpatient clinic today for follow-
up the patient will complete his thiotepa dose today , and
he will return tomorrow for the last dose of his thiotepa .
Courtesy of Martha Palmer
31. Agent Action
Agent
Agent
Loc Theme
Agent
Purpose
Coreference:
“patient’s donor”
The patient returns to the outpatient clinic today for follow-up
the patient will complete his thiotepa dose today , and he will return
tomorrow for the last dose of his thiotepa .
His donor completed stem-cell collection yesterday
Courtesy of Martha Palmer
34. Y2 Proposed Deliverables
Release of a library of de-identification tools (Sept, 2011)
– MIST
– MIT/SUNY
Evaluation workbench (Sept, 2011)
cTAKES Side Effects module (Aug, 2011)
Modules for relation extraction (Dec, 2011)
– Semantic role labeler
– Relation classifier
– Integration of CLEAR-TK (University of Colorado)
End-to-end tool, v2 (cTAKES v2) (April, 2012)
– NLP to populate CEMs for Diseases, Sign/Symptoms, Procedures,
Labs, Anatomical sites
– Integration of LexGrid/LexEVS services
35. Development Challenges and
Opportunities
Open source strategy
Release early release often
Test driven development with continuous
integration
All milestones measured by what we can get IRB
and DUA approved and deployed with real or de-
identified clinical data
37. Partnerships
Strengthen existing SHARP collaborations
– Initiate collaborations with SHARP 2 around
usability
– SHARP 1: methods for data security in a cloud
deployed framework
– I2b2: the glue between SHARP 3 and SHARP
4
Non-SHARP collaborations
39. cTAKES as a Service
Objectives
1. Demo cTAKES prototype web application
Empower End Users to leverage cTAKES
2. Gather feedback for future cTAKES GUI
3. Potential system integrations with other
applications
(i.e. i2b2, ARC, Web Annotator)
Developed within i2b2 to integrate
cTAKES in the i2b2 NLP cell