Using NLP and curation to make clinical data available for research

Assisted Chart Abstraction:
a technique to help while
we wait for Nirvana

The Driver

Many entities within Northwestern Medicine (NM) want to
capture data about cancer patients treated at NM.

! 

Research

! 

Education

! 

Operational/Outcomes Analysis

! 

QA/QC and Process Improvement

! 

Marketing/Branding/Outreach Assessment

Challenges

NM has multiple EHR systems: Epic (NMFF), PowerChart
(NMH) and Mosaiq (Radiation Oncology).

Not all clinical systems ﬂow into one of the EHRs

All relevant data is not discretely captured during the course of
clinical care. For example, pathology diagnosis is recorded
in a textual document.

Northwestern Medicine Enterprise
Data Warehouse (NMEDW)

The NMEDW is:

One stop shop for ﬁnding data from 40+ clinical systems, 10
years of data, and 2.2 Million patients (4 billion events!)

Optimized cross-system data marts representing major
biomedical entities and events: patients, providers,
encounters, labs, medications...and more.

Intelligent structures, data representations, and the ability to
identify and correlate data across patients, events and data
types

Who is requesting
change?

The Northwestern Brain Tumor Institute*

SPORE in Prostate Cancer

Lynn Sage Breast Cancer Center

Gastrointestinal Oncology Group

Many others - typically disease-focused

* We will focus mainly on use cases and workﬂows from BTI

What data do they need?

Demographics

Diagnosis

Treatment

Disease Progression

Survival

Old Solution

Data coordinator opens up EHR(s) and manually copies data
into a clinical database.

Newer solution: Data coordinator pulls data from reports run
against the NMEDW and copies/extracts/annotates them
into the clinical database

Command + Tab Model

A manually curated database disconnected from EHR data.

Depends on a data coordinator ﬁnding and manually copying
data from the EHR to a clinical database.

Command + Tab Model:
Pros

Depends on humans:

Humans are great at interpreting narrative documentation -where a signiﬁcant portion of cancer clinical data
(unfortunately) resides.

Command + Tab Model:
Cons

Depends on humans:

Difﬁcult for a human to be aware of every relevant
medical event of every patient within a cohort.

Ignores the ﬂux that occurs within EHRs: patient medical
histories merging and splitting.

Humans get bored with rote copying discrete data.

Humans quit and get new jobs.

New Solution:
NBTI Data CaptureTool

The NBTI Data Capture Tool automatically pulls (via the
NMEDW) relevant EHR data for each patient.

Data points discretely captured in the EHR need no further
review.

Data points captured non-discreetly in textual documents are
abstracted via natural language process (NLP) and
presented to a data coordinator for review/revision.

Why not use reports?

Lots of valuable clinical data still resides in narrative
documents.

Not all discrete data contained within the EHR(s) has
been normalized into easily queryable structures in
the NMEDW.

Today an investigator cannot ask an NMEDW analyst
the question and get a quick result:

How many IDH1 negative glioma patients survived
longer than 5 years?

Waiting for Nirvana

NMEDW reports will not obsolete research clinical
databases until:

! 

! 

Clinical IT optimizes the EHRs to discretely
capture all relevant data points (ain’t happenin’)

The NMEDW normalizes all EHR data into easily
digestible formats and to reference
terminologies (limited by above step!)

Sources for the ﬁrst
iteration

Epic: support discrete data capture of fundamental treatment/
diagnosis data points.

Epic/MyChart: embed intake form.

Cerner: support discrete data capture of pathology data points.

Cerner: support explicit association between pathology cases
and surgeries.

MOSAIQ: support discrete data capture of site, laterality for
radiation therapies:.

Analyze the Data

Started with a list of data elements and sample data from a
neuro-oncologist and a neurosurgeon

Determined obtainability of each data element:

! 

Discrete in the EHR and in the EDW.

! 

Discrete in the EHR but not in the EDW.

! 

narrative document in the EHR and in the EDW.

! 

narrative document in the EHR but not in the EDW.

Build an EDW
Data Mart

Engaged the NMEDW team to build a NBTI-dedicated data mart and
extract transform load (ETL) script:

patients

encounters

medications

surgeries

surgery notes

pathology cases

gamma knifes

radiation therapies

imaging exams

progress notes

labs

Build a Clinical Database

Build a clinical database mirroring the structure of the
EDW data mart in a PostgreSQL server

Add database structures to allow for the layering of
curated data on top of data imported from the EDW.

Import Data

Expose the data in the EDW data mart as
web services via SQL Reporting Service
reports.

Automate via cron jobs the pull of data
into the clinical database from the EDW
with shared EDW web service adapter
code.

Patients

The criteria for inclusion within the NBTI system is
determined by a list of ICD diagnosis codes. Criteria
could alternatively be determined by consent to a
protocol.

Pull from the NMEDW patient name, birth date, MRN(s),
gender, ethnicity, race, death date and last seen date
(across Northwestern Medicine - NM).

Integrate with Specimen
Inventory Data

Prepare data for migration into PathCore's specimen
inventory system BSI2.

Allow for ad hoc query exploration of specimen
inventory based on clinical data points.

Standardizing the structure of clinical data captures
across sites makes this possible.

NLP

Build an NLP pipeline to abstract from the ﬂow of
narrative documents and textual fragments discrete
data points.

Use the Stanford NLP library for chunking and sentence
splitting.

Use the lingscope library to parse the negation scope of
sentences.

Use the NCI metathesaurus for synonym lookups and
attaching codes

Electronic Intake Form

Deploy an electronic intake that can be filled out by a
patient before or at their first clinic visit.

email is sent, can be filled out by web browser, tablet or
(painfully) on a smart phone

Biopsy, Surgery and
Pathology Diagnosis

Pull from the NMEDW NM biopsies, surgeries, surgical
procedure reports and pathology cases (inside and out).

Abstract and allow for the conﬁrmation/revision of surgery
type, site, laterality, pathology diagnosis, grade,
recurrence, anatomical location of primary, cancer staging
and pathology test results.

Present the NLP
Abstractions

Present NLP-derived abstractions as queues of work that
the data coordinator needs to conﬁrm or revise.

Gamma Knife Radiation
Therapy

Pull from the NMEDW NM radiation therapies.

Abstract and allow for the conﬁrmation/revision of
site and laterality.

Intravenous
Chemotherapy

Pull from the NMEDW NM intravenous
chemotherapy treatments (from Intellidose and
Epic Beacon).

Labs and Other
Medications

Labs

• 

Pull from the NMEDW NM labs.

Other Medications

• 

Pull from the NMEDW NM non-intravenous,
prescribed/ordered medications.

• 

Allow for conﬁrmation/revision of drug, route,
duration, amount, patient parameter and
administered.

Imaging Exams and Clinic
Visits

Imaging Exams

• 
• 

Pull from the NMEDW NM imaging exams.

Abstract and allow for conﬁrmation/revision of
response/progression declarations and lesion
measurements.

Clinic Visits

• 

Pull from the NMEDW NM clinic visit notes. Abstract
and allow for conﬁrmation/revision of performance
status declarations and tracking of outside treatments.

Reporting

Ad-hoc query exploration of data.

Integrate NMH quality metrics.

Generate Kaplan Meier survival curves against SEER
data on demand

Export

Export into Word, Excel, CSV for analysis and
visualization by SAS, SPSS, R, etc

Using NLP and curation to make clinical data available for research

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Using NLP and curation to make clinical data available for research

Similar to Using NLP and curation to make clinical data available for research (20)

More from Warren Kibbe

More from Warren Kibbe (20)

Recently uploaded

Recently uploaded (20)

Using NLP and curation to make clinical data available for research