Applying NLP to Personalized Healthcare - 2021

Dr. David Talby, CTO, John Snow Labs
Real-World Lessons from Applying
Natural Language Processing to
Personalized Healthcare

2
Agenda
Personalized Medicine has a big NLP challenge
State-of-the-art accuracy has recently improved
Real-world solutions require hyper-specialization
1.
2.
3.

3
Understanding
the Language
of Healthcare

4
John Snow Labs: The Team Behind Spark NLP
Most popular
O’Reilly Media
54% share
of healthcare AI teams
use Spark NLP
Gradient Flow
9x growth
In downloads of the
library during 2020
PyPI Download Stats
NLP library in
the enterprise

5
Today’s Medicine Is Imprecise
80%
non-responders for
the top 20 drugs
So, how shall we fix it?

6
Case Study:
Recommending The Next Best Action
• A clinical guideline is a document that guides decisions and
criteria regarding diagnosis, management, and treatment.
• modern medical guidelines are based on an examination of
current evidence within the paradigm of evidence-based
medicine.
• They usually summarize consensus statements on best
practice.
• A doctor is obliged to know the medical guidelines of their
profession and must decide whether to follow them for an
individual patient.

7
Where Is The Clinically Relevant
Oncology Data?
A detailed study found that
only 40% of about 300 data
points required for clinical
decision support were
available in structured data
“

8
ONCOLOGY: Information Extraction
Examples
Token (Word) Entity Category (Labels)
Upper outer quadrant Tumor Site (Localization)
pT1 Primary Tumor (pT1)
pN1 Regional Lymph Nodes (pN1)
pM1 Distant Metastasis (pM1)
Left Specimen Laterality (Laterality)
4.5 x 2.0 x 2.0 cm Size of Invasive Carcinoma (Size)
Medullary Carcinoma Histologic Type (Type)
Poorly Differentiated Histologic Grade (Grade)

9
Case Study #2:
Matching Patients To Clinical Trials
5,446
cutting-edge treatments
are behind the gates of
clinical trials
1,870
Cancer drugs
139
Alzheimer’s
drugs
94
Asthma drugs
71
COPD drugs
162 138 91 68
Pain drugs Rheumatoid
arthritis drugs
Parkinson’s
drugs
HIV/AIDS
drugs
154
Type 2 diabetes
drugs
105 2,554
Other
Psoriasis
drugs
SOURCE: Estimated based on Pharmaprojects, “Pharma R&D Annual Review 2017”

10
Not Enough People Participate
In Clinical Trials
The consequences, for one blockbuster
drug, due to the average delay in clinical
trial recruiting:
993 human lives lost
$2.9B revenue lost
5x drug price increase
SOURCE: Cutting Edge Information, FiercePharma, Life Extension Magazine
https://medrio.com/blog/overcoming-patient-recruitment-and-retention-hurdles/

11
The NLP Challenge:
Complex Enrollment Criteria
Increasingly complex
enrollment criteria made
off-the-shelf models that
extract findings, treatments,
drugs, labs, etc. largely
useless.
“

12
Case Study #3:
Populating Patient Registries
• A patient registry is a database of facts about patients who are
affected by a particular condition.
• In rare diseases, they play an important role in the therapy
development pathway.
• They can identify candidates for clinical trials, help develop
standards of care,
and accelerate medical research.
• The average U.S. healthcare facility participates in 5 to 10
registries, sometimes due to regulatory requirements.
• Registries are usually filled in manually, implying significant
cost, time, and effort.
https://https://treat-nmd.org/patient-registries/

13
This Is Largely
An NLP Challenge, Too
PLAN: Follow up with
dietitian to manage
hypertension &
borderline LDL and
increase dosage of
lisinopril since BP was
152/85 today
“
(for Patient K)
Source: https://www.ahrq.gov/ncepcr/tools/pf-handbook/mod20-appendix-b.html

14
• It takes weeks to manually update a registry (how fast
can we get real-world data on COVID-19 vaccines and
long-term impact?)
• It delays clinical trials, by adding major projects to each
trial
• Trained nurses who do this work aren’t treating patients
• It’s expensive, making exploratory abstraction
Why this NLP Challenge is Important

15
Agenda
1.
2.
3.

16
“State of the Art Natural Language Processing”
What does that mean?
Academic SOTA:
• Best accuracy on public benchmarks
• Published as peer-reviewed papers
Applied SOTA adds:
• In production in multiple organizations
• Open source or open core

17
Clinical Named Entity Recognition

18
1. State of the Art Biomedical NER
Spark NLP for Healthcare holds the top spot on 8 of 11 benchmarks in the
‘Medical Named Entity Recognition’ category on ‘Papers With Code’
Disease
Chemicals
Species
Cancer Genetics
Anatomy
Species
Cellular
Chemical, Disease

19
2. Real-World Use
Interpreting millions of patient stories
with deep learned OCR and NLP
Stacy Ashworth
Executive Vice President of Clinical Innovation
Spark NLP in action: Improving patient
flow forecasting at Kaiser Permanente
Santosh Kulkarni
Director of Product Mgmt., Kaiser Permanente
A Unified CV, OCR, and NLP for
Scalable Document Understanding
Text Analytics and its Applications
in the Pharma Industry
Harsha Gurulingappa, Ph.D.
Text Analytics Product Owner at Merck
NLP in Oncology Real World Data:
Opportunities to develop a true learning
healthcare system
Patrick Beukema, Ph.D.
Senior ML Engineer, DocuSign
Automated & Explainable Deep
Learning for Clinical Language
Understanding at Roche
Vishakha Sharma, Ph.D.
Principal Data Scientist, Roche
George A. Komatsoulis, Ph.D.
Chief of Bioinformatics at CancerLinQ

20
3. Optimized for Modern Platforms
• Optimized builds of Spark NLP for both Intel and Nvidia were
tested on AWS
• Task: Training a Named Entity Recognizer in French
• Achieving F1-score of 89% requires at least 80 Epochs with
batch size of 512
• CPU can train even faster by using batch size of 1024, since
they can leverage all system memory beyond just 12GB
Intel-Optimized Spark NLP on Cascade
Lake is
19% faster and 46% cheaper
than GPU-optimized Spark NLP on Tesla
v100

21
Clinical Assertion Status Detection
• Recognized entities are useless, in most real cases, without
also knowing their assertion status.
• This is a separate NLP task, with its own algorithms and deep
learning models to address.
• Assertion status models are domain-specific and must
sometimes be trained for your use case:
o Orthopedics: Is a condition chronic?
o Surgery: Is this a past event, a plan,
or proposed treatment?
o Insurance: Is this a statement of evidence,
question about a fact, or an answer?
Entities
For each recognized entity: Is this assertion positive, negative, suspected, conditional, or about someone else?

22
Current SOTA Metrics
Source: https://arxiv.org/abs/2012.04005 Improving Clinical Document Understanding on COVID-19 Research with Spark NLP, Kocaman and Talby, SDU workshop at AAAI 2021
Peer-Reviewed Accuracy:
Assertion detection model test metrics. Our
implementation exceeds the benchmarks in the latest
best model in 4 out of 6 assertion labels – and in
overall accuracy.
Assertion Label Spark NLP Latest Best
Absent 0.944 0.937
Someone-else 0.904 0.869
Conditional 0.441 0.422
Hypothetical 0.862 0.890
Possible 0.680 0.630
Present 0.953 0.957
micro F1 0.939 0.934

23
Relation Extraction
Identify and name relations between named entities.
Source: https://demo.johnsnowlabs.com/healthcare/RE_CLINICAL_EVENTS/

24
Current SOTA Metrics
Sourcehttps://nlp.johnsnowlabs.com/docs/en/licensed_release_notes – Spark NLP for Healthcare 2.7.3 Release Notes
Model Spark NLP Current Best
Temporal relations between clinical events 71.0 80.2 1
Relations between symptoms, procedures, and
treatments
69.2 68.2 2
Genes and human phenotypes 87.9 67.2 3
Drug-drug interactions 72.1 83.8 4
Chemicals and proteins 94.1 83.64 5

25
Text De-Identification
Patient AIQING, 25 month years-old , born in Beijing, was transferred to the Johns Hopkins Hospital.
Original Text:
Patient BLONDIE, 35 years-old , born in Talihina, was transferred to the Mid-Columbia Medical Center.
Obfuscated:
Patient <NAME>, <AGE> month years-old , born in <LOCATION>, was transferred to the <LOCATION>.
Tagged:

26
Real-World De-Identification
Far more complex that obfuscating text:
• Compliance requirements differ by country & usage
• Technical, data, and personnel controls must be placed
• Structured data, unstructured text, image, audio, video, and sensor
data should be jointly de-identified
• Text de-identification is also a computer vision challenge when PDF,
DICOM, and image files are involved
SOTA metrics:
• Similar accuracy to what human domain experts achieve
Source: https://demo.johnsnowlabs.com/ocr/DEID_PDF_HIPAA/
Benchmark Spark NLP Current Best
2014 n2c2 0.961 0.955

27
Agenda
1.
2.
3.

28
Healthcare has hundreds of languages

29
Clinical NER: 100+ Entities and Going
Posology NER
Anatomy NER
PHI NER

30
30+ Medical Relations & Counting
• Overusing relation extraction will create many relations, not
all relevant to your use case
• 90% accuracy means that 2-4 relation extraction models will
show 1-4 errors in every paragraph.
• Your UI must assume some incorrect results and help the
user choose between them
• Was a drug given in order to treat a symptom?
• Or was the symptom a result of taking the drug?
• Or was the drug avoided because of the symptom?
• Maybe the drug caused the symptom to worsen?
Source: https://nlp.johnsnowlabs.com/demo

31
Entity Normalization
Normalization can be about mapping to an ontology, spell checking, changing units, …
adalimumab 54.5 + 43.2 gm
Agnogenic one half cup
interferon alfa-2b 10 million unit ( 1 ml ) injec
adalimumab 97700 mg
Agnogenic 0.5 oral solution
interferon alfa - 2b 10000000 unt ( 1 ml ) injection
Source: https://nlp.johnsnowlabs.com/docs/en/licensed_release_notes/

32
De-identification:
Use document & specialty-specific models
Source Output Comments
Reid A. Smith, M.D. <PERSON> A. <PERSON>, M.D.
Doctor names don’t have to be
de-identified per HIPAA.
KIDNEY STONE SURGERY KIDNEY <PERSON> SURGERY
Polysemic terms (sometimes a person’s
name, but not always)
Patient with Parkinson. Patient with <PERSON>. Deseases that are surnames
Fergus Falls. <PERSON> Falls.
City “Fergus Falls” in Minnesota, versus
someone named Fergus who is falling.

33
Role in Organization
Technical Leaders
All Respondents
Source: gradientflow.com
Considerations for a healthcare NLP solution
Trainable Clinically Validated Private

34
Summary
1.
2.
3.

35
Thank you!
© 2015-2021 John Snow Labs Inc. All rights reserved. The John Snow Labs logo is a trademarks of John Snow Labs Inc. The included information is for informational purposes only and represents the
current view of John Snow Labs as of the date of this presentation. Since John Snow Labs must respond to changing market conditions, it should not be interpreted to be a commitment on its part, and John
Snow Labs cannot guarantee the accuracy of any information provided after the date of this presentation. John Snow Labs makes no warranties, express or statutory, as to the information in this presentation.
www.johnsnowlabs.com/spark-nlp-in-action/
www.johnsnowlabs.com/spark-nlp-try-free/
Live demos:
Free trial:

Applying NLP to Personalized Healthcare - 2021

Recommended

Recommended

More Related Content

Similar to Applying NLP to Personalized Healthcare - 2021

Similar to Applying NLP to Personalized Healthcare - 2021 (20)

More from David Talby

More from David Talby (11)

Recently uploaded

Recently uploaded (20)

Applying NLP to Personalized Healthcare - 2021