SlideShare a Scribd company logo
1 of 35
Dr. David Talby, CTO, John Snow Labs
Real-World Lessons from Applying
Natural Language Processing to
Personalized Healthcare
2
Agenda
Personalized Medicine has a big NLP challenge
State-of-the-art accuracy has recently improved
Real-world solutions require hyper-specialization
1.
2.
3.
3
Understanding
the Language
of Healthcare
4
John Snow Labs: The Team Behind Spark NLP
Most popular
O’Reilly Media
54% share
of healthcare AI teams
use Spark NLP
Gradient Flow
9x growth
In downloads of the
library during 2020
PyPI Download Stats
NLP library in
the enterprise
5
Today’s Medicine Is Imprecise
80%
non-responders for
the top 20 drugs
So, how shall we fix it?
6
Case Study:
Recommending The Next Best Action
• A clinical guideline is a document that guides decisions and
criteria regarding diagnosis, management, and treatment.
• modern medical guidelines are based on an examination of
current evidence within the paradigm of evidence-based
medicine.
• They usually summarize consensus statements on best
practice.
• A doctor is obliged to know the medical guidelines of their
profession and must decide whether to follow them for an
individual patient.
7
Where Is The Clinically Relevant
Oncology Data?
A detailed study found that
only 40% of about 300 data
points required for clinical
decision support were
available in structured data
“
8
ONCOLOGY: Information Extraction
Examples
Token (Word) Entity Category (Labels)
Upper outer quadrant Tumor Site (Localization)
pT1 Primary Tumor (pT1)
pN1 Regional Lymph Nodes (pN1)
pM1 Distant Metastasis (pM1)
Left Specimen Laterality (Laterality)
4.5 x 2.0 x 2.0 cm Size of Invasive Carcinoma (Size)
Medullary Carcinoma Histologic Type (Type)
Poorly Differentiated Histologic Grade (Grade)
9
Case Study #2:
Matching Patients To Clinical Trials
5,446
cutting-edge treatments
are behind the gates of
clinical trials
1,870
Cancer drugs
139
Alzheimer’s
drugs
94
Asthma drugs
71
COPD drugs
162 138 91 68
Pain drugs Rheumatoid
arthritis drugs
Parkinson’s
drugs
HIV/AIDS
drugs
154
Type 2 diabetes
drugs
105 2,554
Other
Psoriasis
drugs
SOURCE: Estimated based on Pharmaprojects, “Pharma R&D Annual Review 2017”
10
Not Enough People Participate
In Clinical Trials
The consequences, for one blockbuster
drug, due to the average delay in clinical
trial recruiting:
993 human lives lost
$2.9B revenue lost
5x drug price increase
SOURCE: Cutting Edge Information, FiercePharma, Life Extension Magazine
https://medrio.com/blog/overcoming-patient-recruitment-and-retention-hurdles/
11
The NLP Challenge:
Complex Enrollment Criteria
Increasingly complex
enrollment criteria made
off-the-shelf models that
extract findings, treatments,
drugs, labs, etc. largely
useless.
“
12
Case Study #3:
Populating Patient Registries
• A patient registry is a database of facts about patients who are
affected by a particular condition.
• In rare diseases, they play an important role in the therapy
development pathway.
• They can identify candidates for clinical trials, help develop
standards of care,
and accelerate medical research.
• The average U.S. healthcare facility participates in 5 to 10
registries, sometimes due to regulatory requirements.
• Registries are usually filled in manually, implying significant
cost, time, and effort.
https://https://treat-nmd.org/patient-registries/
13
This Is Largely
An NLP Challenge, Too
PLAN: Follow up with
dietitian to manage
hypertension &
borderline LDL and
increase dosage of
lisinopril since BP was
152/85 today
“
(for Patient K)
Source: https://www.ahrq.gov/ncepcr/tools/pf-handbook/mod20-appendix-b.html
14
• It takes weeks to manually update a registry (how fast
can we get real-world data on COVID-19 vaccines and
long-term impact?)
• It delays clinical trials, by adding major projects to each
trial
• Trained nurses who do this work aren’t treating patients
• It’s expensive, making exploratory abstraction
Why this NLP Challenge is Important
15
Agenda
Personalized Medicine has a big NLP challenge
State-of-the-art accuracy has recently improved
Real-world solutions require hyper-specialization
1.
2.
3.
16
“State of the Art Natural Language Processing”
What does that mean?
Academic SOTA:
• Best accuracy on public benchmarks
• Published as peer-reviewed papers
Applied SOTA adds:
• In production in multiple organizations
• Open source or open core
17
Clinical Named Entity Recognition
18
1. State of the Art Biomedical NER
Spark NLP for Healthcare holds the top spot on 8 of 11 benchmarks in the
‘Medical Named Entity Recognition’ category on ‘Papers With Code’
Disease
Chemicals
Species
Cancer Genetics
Anatomy
Species
Cellular
Chemical, Disease
19
2. Real-World Use
Interpreting millions of patient stories
with deep learned OCR and NLP
Stacy Ashworth
Executive Vice President of Clinical Innovation
Spark NLP in action: Improving patient
flow forecasting at Kaiser Permanente
Santosh Kulkarni
Director of Product Mgmt., Kaiser Permanente
A Unified CV, OCR, and NLP for
Scalable Document Understanding
Text Analytics and its Applications
in the Pharma Industry
Harsha Gurulingappa, Ph.D.
Text Analytics Product Owner at Merck
NLP in Oncology Real World Data:
Opportunities to develop a true learning
healthcare system
Patrick Beukema, Ph.D.
Senior ML Engineer, DocuSign
Automated & Explainable Deep
Learning for Clinical Language
Understanding at Roche
Vishakha Sharma, Ph.D.
Principal Data Scientist, Roche
George A. Komatsoulis, Ph.D.
Chief of Bioinformatics at CancerLinQ
20
3. Optimized for Modern Platforms
• Optimized builds of Spark NLP for both Intel and Nvidia were
tested on AWS
• Task: Training a Named Entity Recognizer in French
• Achieving F1-score of 89% requires at least 80 Epochs with
batch size of 512
• CPU can train even faster by using batch size of 1024, since
they can leverage all system memory beyond just 12GB
Intel-Optimized Spark NLP on Cascade
Lake is
19% faster and 46% cheaper
than GPU-optimized Spark NLP on Tesla
v100
21
Clinical Assertion Status Detection
• Recognized entities are useless, in most real cases, without
also knowing their assertion status.
• This is a separate NLP task, with its own algorithms and deep
learning models to address.
• Assertion status models are domain-specific and must
sometimes be trained for your use case:
o Orthopedics: Is a condition chronic?
o Surgery: Is this a past event, a plan,
or proposed treatment?
o Insurance: Is this a statement of evidence,
question about a fact, or an answer?
Entities
For each recognized entity: Is this assertion positive, negative, suspected, conditional, or about someone else?
22
Current SOTA Metrics
Source: https://arxiv.org/abs/2012.04005 Improving Clinical Document Understanding on COVID-19 Research with Spark NLP, Kocaman and Talby, SDU workshop at AAAI 2021
Peer-Reviewed Accuracy:
Assertion detection model test metrics. Our
implementation exceeds the benchmarks in the latest
best model in 4 out of 6 assertion labels – and in
overall accuracy.
Assertion Label Spark NLP Latest Best
Absent 0.944 0.937
Someone-else 0.904 0.869
Conditional 0.441 0.422
Hypothetical 0.862 0.890
Possible 0.680 0.630
Present 0.953 0.957
micro F1 0.939 0.934
23
Relation Extraction
Identify and name relations between named entities.
Source: https://demo.johnsnowlabs.com/healthcare/RE_CLINICAL_EVENTS/
24
Current SOTA Metrics
Sourcehttps://nlp.johnsnowlabs.com/docs/en/licensed_release_notes – Spark NLP for Healthcare 2.7.3 Release Notes
Model Spark NLP Current Best
Temporal relations between clinical events 71.0 80.2 1
Relations between symptoms, procedures, and
treatments
69.2 68.2 2
Genes and human phenotypes 87.9 67.2 3
Drug-drug interactions 72.1 83.8 4
Chemicals and proteins 94.1 83.64 5
25
Text De-Identification
Patient AIQING, 25 month years-old , born in Beijing, was transferred to the Johns Hopkins Hospital.
Original Text:
Patient BLONDIE, 35 years-old , born in Talihina, was transferred to the Mid-Columbia Medical Center.
Obfuscated:
Patient <NAME>, <AGE> month years-old , born in <LOCATION>, was transferred to the <LOCATION>.
Tagged:
26
Real-World De-Identification
Far more complex that obfuscating text:
• Compliance requirements differ by country & usage
• Technical, data, and personnel controls must be placed
• Structured data, unstructured text, image, audio, video, and sensor
data should be jointly de-identified
• Text de-identification is also a computer vision challenge when PDF,
DICOM, and image files are involved
SOTA metrics:
• Similar accuracy to what human domain experts achieve
Source: https://demo.johnsnowlabs.com/ocr/DEID_PDF_HIPAA/
Benchmark Spark NLP Current Best
2014 n2c2 0.961 0.955
27
Agenda
Personalized Medicine has a big NLP challenge
State-of-the-art accuracy has recently improved
Real-world solutions require hyper-specialization
1.
2.
3.
28
Healthcare has hundreds of languages
29
Clinical NER: 100+ Entities and Going
Posology NER
Anatomy NER
PHI NER
30
30+ Medical Relations & Counting
• Overusing relation extraction will create many relations, not
all relevant to your use case
• 90% accuracy means that 2-4 relation extraction models will
show 1-4 errors in every paragraph.
• Your UI must assume some incorrect results and help the
user choose between them
• Was a drug given in order to treat a symptom?
• Or was the symptom a result of taking the drug?
• Or was the drug avoided because of the symptom?
• Maybe the drug caused the symptom to worsen?
Source: https://nlp.johnsnowlabs.com/demo
31
Entity Normalization
Normalization can be about mapping to an ontology, spell checking, changing units, …
adalimumab 54.5 + 43.2 gm
Agnogenic one half cup
interferon alfa-2b 10 million unit ( 1 ml ) injec
adalimumab 97700 mg
Agnogenic 0.5 oral solution
interferon alfa - 2b 10000000 unt ( 1 ml ) injection
Source: https://nlp.johnsnowlabs.com/docs/en/licensed_release_notes/
32
De-identification:
Use document & specialty-specific models
Source Output Comments
Reid A. Smith, M.D. <PERSON> A. <PERSON>, M.D.
Doctor names don’t have to be
de-identified per HIPAA.
KIDNEY STONE SURGERY KIDNEY <PERSON> SURGERY
Polysemic terms (sometimes a person’s
name, but not always)
Patient with Parkinson. Patient with <PERSON>. Deseases that are surnames
Fergus Falls. <PERSON> Falls.
City “Fergus Falls” in Minnesota, versus
someone named Fergus who is falling.
33
Role in Organization
Technical Leaders
All Respondents
Source: gradientflow.com
Considerations for a healthcare NLP solution
Trainable Clinically Validated Private
34
Summary
Personalized Medicine has a big NLP challenge
State-of-the-art accuracy has recently improved
Real-world solutions require hyper-specialization
1.
2.
3.
35
Thank you!
© 2015-2021 John Snow Labs Inc. All rights reserved. The John Snow Labs logo is a trademarks of John Snow Labs Inc. The included information is for informational purposes only and represents the
current view of John Snow Labs as of the date of this presentation. Since John Snow Labs must respond to changing market conditions, it should not be interpreted to be a commitment on its part, and John
Snow Labs cannot guarantee the accuracy of any information provided after the date of this presentation. John Snow Labs makes no warranties, express or statutory, as to the information in this presentation.
www.johnsnowlabs.com/spark-nlp-in-action/
www.johnsnowlabs.com/spark-nlp-try-free/
Live demos:
Free trial:

More Related Content

Similar to Applying NLP to Personalized Healthcare - 2021

7 Steps for Boosting R&D Outcomes
7 Steps for Boosting R&D Outcomes7 Steps for Boosting R&D Outcomes
7 Steps for Boosting R&D OutcomesTamrMarketing
 
The Role of Data Lakes in Healthcare
The Role of Data Lakes in HealthcareThe Role of Data Lakes in Healthcare
The Role of Data Lakes in HealthcarePerficient, Inc.
 
Week 5 Lab 3· If you choose to download the software from http.docx
Week 5 Lab 3· If you choose to download the software from http.docxWeek 5 Lab 3· If you choose to download the software from http.docx
Week 5 Lab 3· If you choose to download the software from http.docxcockekeshia
 
2012 DIA EMRs for clinical research
2012 DIA  EMRs for clinical research2012 DIA  EMRs for clinical research
2012 DIA EMRs for clinical researchEd Seguine
 
Next Gen Sequencing and Associated Big Data / AI problem
Next Gen Sequencing and Associated Big Data / AI problemNext Gen Sequencing and Associated Big Data / AI problem
Next Gen Sequencing and Associated Big Data / AI problemSubhendu Dey
 
Data & Technology in Clinical Trials
Data & Technology in Clinical TrialsData & Technology in Clinical Trials
Data & Technology in Clinical TrialsNassim Azzi, MBA
 
Sharing and standards christopher hart - clinical innovation and partnering...
Sharing and standards   christopher hart - clinical innovation and partnering...Sharing and standards   christopher hart - clinical innovation and partnering...
Sharing and standards christopher hart - clinical innovation and partnering...Christopher Hart
 
성공하는 디지털 헬스케어 스타트업을 위한 8가지 조언
성공하는 디지털 헬스케어 스타트업을 위한 8가지 조언성공하는 디지털 헬스케어 스타트업을 위한 8가지 조언
성공하는 디지털 헬스케어 스타트업을 위한 8가지 조언Yoon Sup Choi
 
DeciBio Perspectives on Pain Points, Unmet Needs, and Disruption in Precision...
DeciBio Perspectives on Pain Points, Unmet Needs, and Disruption in Precision...DeciBio Perspectives on Pain Points, Unmet Needs, and Disruption in Precision...
DeciBio Perspectives on Pain Points, Unmet Needs, and Disruption in Precision...Andrew Aijian
 
MIT Media Lab REDx Workshop
MIT Media Lab REDx WorkshopMIT Media Lab REDx Workshop
MIT Media Lab REDx WorkshopAbhinandan Dubey
 
A Prescription for Achieving Long-Term EMR Adoption
A Prescription for Achieving Long-Term EMR AdoptionA Prescription for Achieving Long-Term EMR Adoption
A Prescription for Achieving Long-Term EMR Adoptionslvhit
 
Ontology-Driven Clinical Intelligence: A Path from the Biobank to Cross-Disea...
Ontology-Driven Clinical Intelligence: A Path from the Biobank to Cross-Disea...Ontology-Driven Clinical Intelligence: A Path from the Biobank to Cross-Disea...
Ontology-Driven Clinical Intelligence: A Path from the Biobank to Cross-Disea...Remedy Informatics
 
Natural Language Processing to Curate Unstructured Electronic Health Records
Natural Language Processing to Curate Unstructured Electronic Health RecordsNatural Language Processing to Curate Unstructured Electronic Health Records
Natural Language Processing to Curate Unstructured Electronic Health RecordsMMS Holdings
 
Chapter 4 Knowledge Discovery, Data Mining, and Practice-Based Evi.docx
Chapter 4 Knowledge Discovery, Data Mining, and Practice-Based Evi.docxChapter 4 Knowledge Discovery, Data Mining, and Practice-Based Evi.docx
Chapter 4 Knowledge Discovery, Data Mining, and Practice-Based Evi.docxchristinemaritza
 
HIT Asthma: A Tale of Woe and Enlightenment
HIT Asthma: A Tale of Woe and EnlightenmentHIT Asthma: A Tale of Woe and Enlightenment
HIT Asthma: A Tale of Woe and EnlightenmentYiscah Bracha
 
HITAsthma: A Tale of Woe and Enlightenment
HITAsthma: A Tale of Woe and EnlightenmentHITAsthma: A Tale of Woe and Enlightenment
HITAsthma: A Tale of Woe and Enlightenmentgueste165460
 
Advanced Analytics for Clinical Data Full Event Guide
Advanced Analytics for Clinical Data Full Event GuideAdvanced Analytics for Clinical Data Full Event Guide
Advanced Analytics for Clinical Data Full Event GuidePfizer
 
T1D Exchange April 2013
T1D Exchange April 2013T1D Exchange April 2013
T1D Exchange April 2013T1DExchange
 

Similar to Applying NLP to Personalized Healthcare - 2021 (20)

7 Steps for Boosting R&D Outcomes
7 Steps for Boosting R&D Outcomes7 Steps for Boosting R&D Outcomes
7 Steps for Boosting R&D Outcomes
 
Future Solutions from Qualitative Big Data
Future Solutions from Qualitative Big Data Future Solutions from Qualitative Big Data
Future Solutions from Qualitative Big Data
 
The Role of Data Lakes in Healthcare
The Role of Data Lakes in HealthcareThe Role of Data Lakes in Healthcare
The Role of Data Lakes in Healthcare
 
Week 5 Lab 3· If you choose to download the software from http.docx
Week 5 Lab 3· If you choose to download the software from http.docxWeek 5 Lab 3· If you choose to download the software from http.docx
Week 5 Lab 3· If you choose to download the software from http.docx
 
2012 DIA EMRs for clinical research
2012 DIA  EMRs for clinical research2012 DIA  EMRs for clinical research
2012 DIA EMRs for clinical research
 
Next Gen Sequencing and Associated Big Data / AI problem
Next Gen Sequencing and Associated Big Data / AI problemNext Gen Sequencing and Associated Big Data / AI problem
Next Gen Sequencing and Associated Big Data / AI problem
 
Data & Technology in Clinical Trials
Data & Technology in Clinical TrialsData & Technology in Clinical Trials
Data & Technology in Clinical Trials
 
Sharing and standards christopher hart - clinical innovation and partnering...
Sharing and standards   christopher hart - clinical innovation and partnering...Sharing and standards   christopher hart - clinical innovation and partnering...
Sharing and standards christopher hart - clinical innovation and partnering...
 
성공하는 디지털 헬스케어 스타트업을 위한 8가지 조언
성공하는 디지털 헬스케어 스타트업을 위한 8가지 조언성공하는 디지털 헬스케어 스타트업을 위한 8가지 조언
성공하는 디지털 헬스케어 스타트업을 위한 8가지 조언
 
DeciBio Perspectives on Pain Points, Unmet Needs, and Disruption in Precision...
DeciBio Perspectives on Pain Points, Unmet Needs, and Disruption in Precision...DeciBio Perspectives on Pain Points, Unmet Needs, and Disruption in Precision...
DeciBio Perspectives on Pain Points, Unmet Needs, and Disruption in Precision...
 
legal cv
legal cvlegal cv
legal cv
 
MIT Media Lab REDx Workshop
MIT Media Lab REDx WorkshopMIT Media Lab REDx Workshop
MIT Media Lab REDx Workshop
 
A Prescription for Achieving Long-Term EMR Adoption
A Prescription for Achieving Long-Term EMR AdoptionA Prescription for Achieving Long-Term EMR Adoption
A Prescription for Achieving Long-Term EMR Adoption
 
Ontology-Driven Clinical Intelligence: A Path from the Biobank to Cross-Disea...
Ontology-Driven Clinical Intelligence: A Path from the Biobank to Cross-Disea...Ontology-Driven Clinical Intelligence: A Path from the Biobank to Cross-Disea...
Ontology-Driven Clinical Intelligence: A Path from the Biobank to Cross-Disea...
 
Natural Language Processing to Curate Unstructured Electronic Health Records
Natural Language Processing to Curate Unstructured Electronic Health RecordsNatural Language Processing to Curate Unstructured Electronic Health Records
Natural Language Processing to Curate Unstructured Electronic Health Records
 
Chapter 4 Knowledge Discovery, Data Mining, and Practice-Based Evi.docx
Chapter 4 Knowledge Discovery, Data Mining, and Practice-Based Evi.docxChapter 4 Knowledge Discovery, Data Mining, and Practice-Based Evi.docx
Chapter 4 Knowledge Discovery, Data Mining, and Practice-Based Evi.docx
 
HIT Asthma: A Tale of Woe and Enlightenment
HIT Asthma: A Tale of Woe and EnlightenmentHIT Asthma: A Tale of Woe and Enlightenment
HIT Asthma: A Tale of Woe and Enlightenment
 
HITAsthma: A Tale of Woe and Enlightenment
HITAsthma: A Tale of Woe and EnlightenmentHITAsthma: A Tale of Woe and Enlightenment
HITAsthma: A Tale of Woe and Enlightenment
 
Advanced Analytics for Clinical Data Full Event Guide
Advanced Analytics for Clinical Data Full Event GuideAdvanced Analytics for Clinical Data Full Event Guide
Advanced Analytics for Clinical Data Full Event Guide
 
T1D Exchange April 2013
T1D Exchange April 2013T1D Exchange April 2013
T1D Exchange April 2013
 

More from David Talby

Building State-of-the-art Natural Language Processing Projects with Free Soft...
Building State-of-the-art Natural Language Processing Projects with Free Soft...Building State-of-the-art Natural Language Processing Projects with Free Soft...
Building State-of-the-art Natural Language Processing Projects with Free Soft...David Talby
 
Turning Medical Expert Knowledge into Responsible Language Models - K1st World
Turning Medical Expert Knowledge into Responsible Language Models - K1st WorldTurning Medical Expert Knowledge into Responsible Language Models - K1st World
Turning Medical Expert Knowledge into Responsible Language Models - K1st WorldDavid Talby
 
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...David Talby
 
Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...
Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...
Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...David Talby
 
Natural Language Understanding in Healthcare
Natural Language Understanding in HealthcareNatural Language Understanding in Healthcare
Natural Language Understanding in HealthcareDavid Talby
 
Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 editionDavid Talby
 
Deep learning for natural language understanding
Deep learning for natural language understandingDeep learning for natural language understanding
Deep learning for natural language understandingDavid Talby
 
Build your open source data science platform
Build your open source data science platformBuild your open source data science platform
Build your open source data science platformDavid Talby
 
Natural Language Understanding with Machine Learned Annotators and Deep Learn...
Natural Language Understanding with Machine Learned Annotators and Deep Learn...Natural Language Understanding with Machine Learned Annotators and Deep Learn...
Natural Language Understanding with Machine Learned Annotators and Deep Learn...David Talby
 
Architecting a Predictive, Petabyte-Scale, Self-Learning Fraud Detection System
Architecting a Predictive,  Petabyte-Scale, Self-Learning Fraud Detection SystemArchitecting a Predictive,  Petabyte-Scale, Self-Learning Fraud Detection System
Architecting a Predictive, Petabyte-Scale, Self-Learning Fraud Detection SystemDavid Talby
 
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...David Talby
 

More from David Talby (11)

Building State-of-the-art Natural Language Processing Projects with Free Soft...
Building State-of-the-art Natural Language Processing Projects with Free Soft...Building State-of-the-art Natural Language Processing Projects with Free Soft...
Building State-of-the-art Natural Language Processing Projects with Free Soft...
 
Turning Medical Expert Knowledge into Responsible Language Models - K1st World
Turning Medical Expert Knowledge into Responsible Language Models - K1st WorldTurning Medical Expert Knowledge into Responsible Language Models - K1st World
Turning Medical Expert Knowledge into Responsible Language Models - K1st World
 
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
 
Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...
Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...
Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...
 
Natural Language Understanding in Healthcare
Natural Language Understanding in HealthcareNatural Language Understanding in Healthcare
Natural Language Understanding in Healthcare
 
Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 edition
 
Deep learning for natural language understanding
Deep learning for natural language understandingDeep learning for natural language understanding
Deep learning for natural language understanding
 
Build your open source data science platform
Build your open source data science platformBuild your open source data science platform
Build your open source data science platform
 
Natural Language Understanding with Machine Learned Annotators and Deep Learn...
Natural Language Understanding with Machine Learned Annotators and Deep Learn...Natural Language Understanding with Machine Learned Annotators and Deep Learn...
Natural Language Understanding with Machine Learned Annotators and Deep Learn...
 
Architecting a Predictive, Petabyte-Scale, Self-Learning Fraud Detection System
Architecting a Predictive,  Petabyte-Scale, Self-Learning Fraud Detection SystemArchitecting a Predictive,  Petabyte-Scale, Self-Learning Fraud Detection System
Architecting a Predictive, Petabyte-Scale, Self-Learning Fraud Detection System
 
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
 

Recently uploaded

Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 

Recently uploaded (20)

E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 

Applying NLP to Personalized Healthcare - 2021

  • 1. Dr. David Talby, CTO, John Snow Labs Real-World Lessons from Applying Natural Language Processing to Personalized Healthcare
  • 2. 2 Agenda Personalized Medicine has a big NLP challenge State-of-the-art accuracy has recently improved Real-world solutions require hyper-specialization 1. 2. 3.
  • 4. 4 John Snow Labs: The Team Behind Spark NLP Most popular O’Reilly Media 54% share of healthcare AI teams use Spark NLP Gradient Flow 9x growth In downloads of the library during 2020 PyPI Download Stats NLP library in the enterprise
  • 5. 5 Today’s Medicine Is Imprecise 80% non-responders for the top 20 drugs So, how shall we fix it?
  • 6. 6 Case Study: Recommending The Next Best Action • A clinical guideline is a document that guides decisions and criteria regarding diagnosis, management, and treatment. • modern medical guidelines are based on an examination of current evidence within the paradigm of evidence-based medicine. • They usually summarize consensus statements on best practice. • A doctor is obliged to know the medical guidelines of their profession and must decide whether to follow them for an individual patient.
  • 7. 7 Where Is The Clinically Relevant Oncology Data? A detailed study found that only 40% of about 300 data points required for clinical decision support were available in structured data “
  • 8. 8 ONCOLOGY: Information Extraction Examples Token (Word) Entity Category (Labels) Upper outer quadrant Tumor Site (Localization) pT1 Primary Tumor (pT1) pN1 Regional Lymph Nodes (pN1) pM1 Distant Metastasis (pM1) Left Specimen Laterality (Laterality) 4.5 x 2.0 x 2.0 cm Size of Invasive Carcinoma (Size) Medullary Carcinoma Histologic Type (Type) Poorly Differentiated Histologic Grade (Grade)
  • 9. 9 Case Study #2: Matching Patients To Clinical Trials 5,446 cutting-edge treatments are behind the gates of clinical trials 1,870 Cancer drugs 139 Alzheimer’s drugs 94 Asthma drugs 71 COPD drugs 162 138 91 68 Pain drugs Rheumatoid arthritis drugs Parkinson’s drugs HIV/AIDS drugs 154 Type 2 diabetes drugs 105 2,554 Other Psoriasis drugs SOURCE: Estimated based on Pharmaprojects, “Pharma R&D Annual Review 2017”
  • 10. 10 Not Enough People Participate In Clinical Trials The consequences, for one blockbuster drug, due to the average delay in clinical trial recruiting: 993 human lives lost $2.9B revenue lost 5x drug price increase SOURCE: Cutting Edge Information, FiercePharma, Life Extension Magazine https://medrio.com/blog/overcoming-patient-recruitment-and-retention-hurdles/
  • 11. 11 The NLP Challenge: Complex Enrollment Criteria Increasingly complex enrollment criteria made off-the-shelf models that extract findings, treatments, drugs, labs, etc. largely useless. “
  • 12. 12 Case Study #3: Populating Patient Registries • A patient registry is a database of facts about patients who are affected by a particular condition. • In rare diseases, they play an important role in the therapy development pathway. • They can identify candidates for clinical trials, help develop standards of care, and accelerate medical research. • The average U.S. healthcare facility participates in 5 to 10 registries, sometimes due to regulatory requirements. • Registries are usually filled in manually, implying significant cost, time, and effort. https://https://treat-nmd.org/patient-registries/
  • 13. 13 This Is Largely An NLP Challenge, Too PLAN: Follow up with dietitian to manage hypertension & borderline LDL and increase dosage of lisinopril since BP was 152/85 today “ (for Patient K) Source: https://www.ahrq.gov/ncepcr/tools/pf-handbook/mod20-appendix-b.html
  • 14. 14 • It takes weeks to manually update a registry (how fast can we get real-world data on COVID-19 vaccines and long-term impact?) • It delays clinical trials, by adding major projects to each trial • Trained nurses who do this work aren’t treating patients • It’s expensive, making exploratory abstraction Why this NLP Challenge is Important
  • 15. 15 Agenda Personalized Medicine has a big NLP challenge State-of-the-art accuracy has recently improved Real-world solutions require hyper-specialization 1. 2. 3.
  • 16. 16 “State of the Art Natural Language Processing” What does that mean? Academic SOTA: • Best accuracy on public benchmarks • Published as peer-reviewed papers Applied SOTA adds: • In production in multiple organizations • Open source or open core
  • 18. 18 1. State of the Art Biomedical NER Spark NLP for Healthcare holds the top spot on 8 of 11 benchmarks in the ‘Medical Named Entity Recognition’ category on ‘Papers With Code’ Disease Chemicals Species Cancer Genetics Anatomy Species Cellular Chemical, Disease
  • 19. 19 2. Real-World Use Interpreting millions of patient stories with deep learned OCR and NLP Stacy Ashworth Executive Vice President of Clinical Innovation Spark NLP in action: Improving patient flow forecasting at Kaiser Permanente Santosh Kulkarni Director of Product Mgmt., Kaiser Permanente A Unified CV, OCR, and NLP for Scalable Document Understanding Text Analytics and its Applications in the Pharma Industry Harsha Gurulingappa, Ph.D. Text Analytics Product Owner at Merck NLP in Oncology Real World Data: Opportunities to develop a true learning healthcare system Patrick Beukema, Ph.D. Senior ML Engineer, DocuSign Automated & Explainable Deep Learning for Clinical Language Understanding at Roche Vishakha Sharma, Ph.D. Principal Data Scientist, Roche George A. Komatsoulis, Ph.D. Chief of Bioinformatics at CancerLinQ
  • 20. 20 3. Optimized for Modern Platforms • Optimized builds of Spark NLP for both Intel and Nvidia were tested on AWS • Task: Training a Named Entity Recognizer in French • Achieving F1-score of 89% requires at least 80 Epochs with batch size of 512 • CPU can train even faster by using batch size of 1024, since they can leverage all system memory beyond just 12GB Intel-Optimized Spark NLP on Cascade Lake is 19% faster and 46% cheaper than GPU-optimized Spark NLP on Tesla v100
  • 21. 21 Clinical Assertion Status Detection • Recognized entities are useless, in most real cases, without also knowing their assertion status. • This is a separate NLP task, with its own algorithms and deep learning models to address. • Assertion status models are domain-specific and must sometimes be trained for your use case: o Orthopedics: Is a condition chronic? o Surgery: Is this a past event, a plan, or proposed treatment? o Insurance: Is this a statement of evidence, question about a fact, or an answer? Entities For each recognized entity: Is this assertion positive, negative, suspected, conditional, or about someone else?
  • 22. 22 Current SOTA Metrics Source: https://arxiv.org/abs/2012.04005 Improving Clinical Document Understanding on COVID-19 Research with Spark NLP, Kocaman and Talby, SDU workshop at AAAI 2021 Peer-Reviewed Accuracy: Assertion detection model test metrics. Our implementation exceeds the benchmarks in the latest best model in 4 out of 6 assertion labels – and in overall accuracy. Assertion Label Spark NLP Latest Best Absent 0.944 0.937 Someone-else 0.904 0.869 Conditional 0.441 0.422 Hypothetical 0.862 0.890 Possible 0.680 0.630 Present 0.953 0.957 micro F1 0.939 0.934
  • 23. 23 Relation Extraction Identify and name relations between named entities. Source: https://demo.johnsnowlabs.com/healthcare/RE_CLINICAL_EVENTS/
  • 24. 24 Current SOTA Metrics Sourcehttps://nlp.johnsnowlabs.com/docs/en/licensed_release_notes – Spark NLP for Healthcare 2.7.3 Release Notes Model Spark NLP Current Best Temporal relations between clinical events 71.0 80.2 1 Relations between symptoms, procedures, and treatments 69.2 68.2 2 Genes and human phenotypes 87.9 67.2 3 Drug-drug interactions 72.1 83.8 4 Chemicals and proteins 94.1 83.64 5
  • 25. 25 Text De-Identification Patient AIQING, 25 month years-old , born in Beijing, was transferred to the Johns Hopkins Hospital. Original Text: Patient BLONDIE, 35 years-old , born in Talihina, was transferred to the Mid-Columbia Medical Center. Obfuscated: Patient <NAME>, <AGE> month years-old , born in <LOCATION>, was transferred to the <LOCATION>. Tagged:
  • 26. 26 Real-World De-Identification Far more complex that obfuscating text: • Compliance requirements differ by country & usage • Technical, data, and personnel controls must be placed • Structured data, unstructured text, image, audio, video, and sensor data should be jointly de-identified • Text de-identification is also a computer vision challenge when PDF, DICOM, and image files are involved SOTA metrics: • Similar accuracy to what human domain experts achieve Source: https://demo.johnsnowlabs.com/ocr/DEID_PDF_HIPAA/ Benchmark Spark NLP Current Best 2014 n2c2 0.961 0.955
  • 27. 27 Agenda Personalized Medicine has a big NLP challenge State-of-the-art accuracy has recently improved Real-world solutions require hyper-specialization 1. 2. 3.
  • 29. 29 Clinical NER: 100+ Entities and Going Posology NER Anatomy NER PHI NER
  • 30. 30 30+ Medical Relations & Counting • Overusing relation extraction will create many relations, not all relevant to your use case • 90% accuracy means that 2-4 relation extraction models will show 1-4 errors in every paragraph. • Your UI must assume some incorrect results and help the user choose between them • Was a drug given in order to treat a symptom? • Or was the symptom a result of taking the drug? • Or was the drug avoided because of the symptom? • Maybe the drug caused the symptom to worsen? Source: https://nlp.johnsnowlabs.com/demo
  • 31. 31 Entity Normalization Normalization can be about mapping to an ontology, spell checking, changing units, … adalimumab 54.5 + 43.2 gm Agnogenic one half cup interferon alfa-2b 10 million unit ( 1 ml ) injec adalimumab 97700 mg Agnogenic 0.5 oral solution interferon alfa - 2b 10000000 unt ( 1 ml ) injection Source: https://nlp.johnsnowlabs.com/docs/en/licensed_release_notes/
  • 32. 32 De-identification: Use document & specialty-specific models Source Output Comments Reid A. Smith, M.D. <PERSON> A. <PERSON>, M.D. Doctor names don’t have to be de-identified per HIPAA. KIDNEY STONE SURGERY KIDNEY <PERSON> SURGERY Polysemic terms (sometimes a person’s name, but not always) Patient with Parkinson. Patient with <PERSON>. Deseases that are surnames Fergus Falls. <PERSON> Falls. City “Fergus Falls” in Minnesota, versus someone named Fergus who is falling.
  • 33. 33 Role in Organization Technical Leaders All Respondents Source: gradientflow.com Considerations for a healthcare NLP solution Trainable Clinically Validated Private
  • 34. 34 Summary Personalized Medicine has a big NLP challenge State-of-the-art accuracy has recently improved Real-world solutions require hyper-specialization 1. 2. 3.
  • 35. 35 Thank you! © 2015-2021 John Snow Labs Inc. All rights reserved. The John Snow Labs logo is a trademarks of John Snow Labs Inc. The included information is for informational purposes only and represents the current view of John Snow Labs as of the date of this presentation. Since John Snow Labs must respond to changing market conditions, it should not be interpreted to be a commitment on its part, and John Snow Labs cannot guarantee the accuracy of any information provided after the date of this presentation. John Snow Labs makes no warranties, express or statutory, as to the information in this presentation. www.johnsnowlabs.com/spark-nlp-in-action/ www.johnsnowlabs.com/spark-nlp-try-free/ Live demos: Free trial: