SlideShare a Scribd company logo
1 of 29
Dr. David Talby
NATURAL LANGUAGE UNDERSTANDING IN HEALTHCARE:
STATE OF THE ART NLP, MACHINE LEARNING AND DEEP
LEARNING WITH OPEN SOURCE SOFTWARE
CONTENTS
1. THE IMPORTANCE OF NLP IN HEALTHCARE AI
2. NLP IS ULTRA DOMAIN SPECIFIC
3. STATE OF THE ART RESEARCH RESULTS
4. SPARK NLP: PRODUCTION GRADE & AT SCALE
AI VS. DOCTORS
Deep Learning
Computer
Vision
Access to Care
Diagnostic
Accuracy
NLP IN HEALTHCARE
Deep Learning
NLP
Efficiency
Accuracy
Radiology Diagnostic
Mental
Health
Safety
Events
Inpatient
Pre-
Auth
Key
Opinion
Leaders
Research
Meta
Analysis
Clinical
Coding
Financial
Anti-
Fraud
Adverse
Events
Drug Development
Recruit
for Trials
RISK PREDICTION CASE STUDY: DETECTING SEPSIS
“Compared to previous work that only
used structured data such as vital signs
and demographic information, utilizing
free text drastically improves the
discriminatory ability (increase in AUC
from 0.67 to 0.86) of identifying
infection.”
COHORT SELECTION CASE STUDY: ONCOLOGY
“Using the combination of structured and
unstructured data, 8324 patients were
identified as having advanced NSCLC.
Of these patients, only 2472 were also in the
cohort generated using structured data only.
Further, 1090 patients would be included in the
structured data only cohort who should have
been excluded based on additional data.”
CONTENTS
1. THE IMPORTANCE OF NLP IN HEALTHCARE AI
2. NLP IS ULTRA DOMAIN SPECIFIC
3. STATE OF THE ART RESEARCH RESULTS
4. SPARK NLP: PRODUCTION GRADE & AT SCALE
WHAT MAKES LANGUAGE HARD
• Nuanced
– Sure / I agree / Absolutely! / Whatever / Yes sir / Just to see you smile ❤️
• Fuzzy
– Blue, New, Tall, Child, Tell, Do
• Contextual
– “Patient denies alcohol abuse”
• Medium specific
– “SGTM c u in 15”
• Domain specific
– All forward-looking statements included in this document are based on
information available to us on the date hereof, and we assume no obligation to
revise or publicly release any revision to any such forward-looking statement,
except as may otherwise be required by law.
We all speak many languages.
THE FIRST RULE OF NLP:
ED Triage Notes
states started last night, upper abd, took alka seltzer approx
0500, no relief. nausea no vomiting
Since yeatreday 10/10 "constant Tylenol 1 hr ago. +nausea.
diaphoretic. Mid abd radiates to back
Generalized abd radiating to lower x 3 days accompanied
by dark stools. Now with bloody stool this am. Denies dizzy,
sob, fatigue. Visiting from Japan on business.”
Features
Type of Pain
Intensity of Pain
Body part of region
Symptoms
Onset of symptoms
Attempted home remedy
EMERGENCY ROOM LANGUAGE
Different Vocabulary
Different Grammar
Different Context
Different Meaning
Different
Language Models
Tokenizer Normalizer
Lemmatizer Fact Extraction
Part of Speech Tagger Spell Checker
Coreference Resolution Dependency Parser
Sentence Splitting Negation Detection
Named Entity Recognition Sentiment Analysis
Intent Classification Summarization
Word Embeddings Emotion Detection
Question Answering Relevance Ranking
Best Next Action Translation
CONTENTS
1. THE IMPORTANCE OF NLP IN HEALTHCARE AI
2. NLP IS ULTRA DOMAIN SPECIFIC
3. STATE OF THE ART RESEARCH RESULTS
4. SPARK NLP: PRODUCTION GRADE & AT SCALE
NAMED ENTITY RECOGNITION
“DEEP LEARNED” NER
“Entity Recognition from Clinical Texts via Recurrent Neural Network”.
Liu et al., BMC Medical Informatics & Decision Making, July 2017.
F-Score Dataset Task
85.81% 2010 i2b2 Medical concept extraction
92.29% 2012 i2b2 Clinical event detection
94.37% 2014 i2b2 De-identification
ENTITY RESOLUTION
“DEEP LEARNED” ENTITY RESOLUTION
“CNN-based ranking for biomedical entity normalization”.
Li et al., BMC Bioinformatics, October 2017.
F-Score Dataset Task
90.30%
ShARe /
CLEF
Disease & problem norm.
92.29% NCBI Disease norm. in literature
ASSERTION STATUS DETECTION
Prescribing sick days due to diagnosis of influenza. Positive
Jane complains about flu-like symptoms. Speculative
Jane’s RIDT came back clean. Negative
Jane is at risk for flu if she’s not vaccinated. Conditional
Jane’s older brother had the flu last month. Family history
Jane had a severe case of flu last year. Patient history
“DEEP LEARNED” ASSERTION STATUS DETECTION
“Improving Classification of Medical Assertions in Clinical Notes“
Kim et al., In Proceedings of the 49th Annual Meeting of the Association for Computational
Linguistics: Human Language Technologies, 2011.
Dataset Metric
94.17%
4th i2b2/VA
Mirco-averaged F1
79.76% Marco-averaged F1
WORD EMBEDDINGS
BIOMEDICAL WORD EMBEDDINGS
“How to Train Good Word Embeddings for Biomedical NLP”.
Kim et al., In Proceedings of the 49th Annual Meeting of the Association for Computational
Linguistics: Human Language Technologies, 2011.
• Intrinsic measures (similarity between concepts)
• Extrinsic measures (improving NER)
• Hyperparameter optimization
• Sub-sampling
• Minimum-count
• Learning rate
• Vector dimension
• Context window size
• Available under an open license here
CONTENTS
1. THE IMPORTANCE OF NLP IN HEALTHCARE AI
2. NLP IS ULTRA DOMAIN SPECIFIC
3. STATE OF THE ART RESEARCH RESULTS
4. SPARK NLP: PRODUCTION GRADE & AT SCALE
NLP FOR APACHE SPARK
Data Sources API
Spark Core API (RDD’s, Project Tungsten)
Spark SQL API (DataFrame, Catalyst Optimizer)
Spark ML API (Pipeline, Transformer, Estimator)
Part of Speech Tagger
Named Entity Recognition
Sentiment Analysis
Spell Checker
Tokenizer
Stemmer
Lemmatizer
Entity Extraction
Topic Modeling
Word2Vec
TF-IDF
String distance calculation
N-grams calculation
Stop word removal
Train/Test & Cross-Validate
Ensembles
High Performance Natural Language Understanding at Scale
Design Goals
• State of the art Performance & Scale
• Frictionless Reuse
• Enterprise Grade
Built on the Spark ML API’s
Apache 2.0 Licensed
Active development & support
NLP FOR APACHE SPARK: COMBINED NLP & ML PIPELINES
pipeline = pyspark.ml.Pipeline(stages=[
document_assembler,
tokenizer,
stemmer,
normalizer,
stopword_remover,
tf,
idf,
lda])
topic_model = pipeline.fit(df)
Spark NLP annotators
Spark ML featurizers
Spark ML LDA implementation
Single execution plan for whole pipeline
PERFORMANCE BENCHMARKS
• Training was 80x faster to train on 2.6MB
• Training was 38x faster on 100k
• Training on 100k & 2.6MB took roughly the same
• Additional near-linear speedup on a cluster
• Prediction was 1.6x faster on 75MB
• Prediction was 1.4x faster on 15MB
• Adding NLP stages takes roughly the same
• Additional near-linear speedup on a cluster
HEALTHCARE EXTENSIONS
Data Sources API
Spark Core API (RDD’s, Project Tungsten)
Spark SQL API (DataFrame, Catalyst Optimizer)
Spark ML API (Pipeline, Transformer, Estimator)
Part of Speech Tagger
Named Entity Recognition
Sentiment Analysis
Spell Checker
Tokenizer
Stemmer
Lemmatizer
Entity Extraction
Topic Modeling
Word2Vec
TF-IDF
String distance calculation
N-grams calculation
Stop word removal
Train/Test & Cross-Validate
Ensembles
High Performance Natural Language Understanding at Scale
com.johnsnowlabs.nlp.clinical.*
Healthcare specific NLP
annotators for Spark in
Scala, Java or Python:
• Entity Recognition
• Value Extraction
• Word Embeddings
• Assertion Status
• Sentiment Analysis
• Spell Checking, …
data.johnsnowlabs.com/health
1,800+ Expert curated,
clean, linked, enriched &
always up to date data:
• Terminology
• Providers
• Demographics
• Clinical Guidelines
• Genes
• Measures, …
CASE STUDY: DEMAND FORECASTING OF ADMISSIONS FROM ED
Features from Structured Data
• How many patients will be admitted today?
• Data Source: EPIC Clarity data
Reason for visit
Age
Gender
Vital signs
Current wait time
Number of orders
Admit in past 30 days
Type of insurance
CASE STUDY: DEMAND FORECASTING OF ADMISSION FROM ED
Features from Natural Language Text
• A majority of the rich relevant content lies in unstructured notes that
are contributed by doctors and nurses from patient interactions.
• Data Source: Emergency Department Triage notes and other ED notes
Type of Pain
Intensity of Pain
Body part of region
Symptoms
Onset of symptoms
Attempted home remedy
Accuracy Baseline: Human manual prediction
ML with structured data
ML with NLP
USING SPARK NLP
• Homepage: https://nlp.johnsnowlabs.com
– Getting Started, Documentation, Examples, Videos, Blogs
– Join the Slack Community
• GitHub: https://github.com/johnsnowlabs/spark-nlp
– Open Issues & Feature Requests
– Contribute!
• The library has Scala and Python 2 & 3 API’s
• Get directly from maven-central or spark-packages
• Tested on all Spark 2.x versions
david@pacific.ai
@davidtalby
in/davidtalby
THANK YOU!

More Related Content

What's hot

Data science presentation
Data science presentationData science presentation
Data science presentationMSDEVMTL
 
Natural lanaguage processing
Natural lanaguage processingNatural lanaguage processing
Natural lanaguage processinggulshan kumar
 
Post Quantum Cryptography: Technical Overview
Post Quantum Cryptography: Technical OverviewPost Quantum Cryptography: Technical Overview
Post Quantum Cryptography: Technical OverviewRamesh Nagappan
 
Long Short Term Memory
Long Short Term MemoryLong Short Term Memory
Long Short Term MemoryYan Xu
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Alia Hamwi
 
Autoencoder
AutoencoderAutoencoder
AutoencoderHARISH R
 
Deep learning in medicine: An introduction and applications to next-generatio...
Deep learning in medicine: An introduction and applications to next-generatio...Deep learning in medicine: An introduction and applications to next-generatio...
Deep learning in medicine: An introduction and applications to next-generatio...Allen Day, PhD
 
Machine Learning ppt.pptx
Machine Learning ppt.pptxMachine Learning ppt.pptx
Machine Learning ppt.pptx21MC048SARANRAJ
 
Disease Prediction And Doctor Appointment system
Disease Prediction And Doctor Appointment  systemDisease Prediction And Doctor Appointment  system
Disease Prediction And Doctor Appointment systemKOYELMAJUMDAR1
 

What's hot (20)

Data Analytics for IoT
Data Analytics for IoT Data Analytics for IoT
Data Analytics for IoT
 
NAMED ENTITY RECOGNITION
NAMED ENTITY RECOGNITIONNAMED ENTITY RECOGNITION
NAMED ENTITY RECOGNITION
 
Rnn and lstm
Rnn and lstmRnn and lstm
Rnn and lstm
 
Big Data
Big DataBig Data
Big Data
 
Hidden markov model ppt
Hidden markov model pptHidden markov model ppt
Hidden markov model ppt
 
Machine Learning in Big Data
Machine Learning in Big DataMachine Learning in Big Data
Machine Learning in Big Data
 
Attention Is All You Need
Attention Is All You NeedAttention Is All You Need
Attention Is All You Need
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
Natural lanaguage processing
Natural lanaguage processingNatural lanaguage processing
Natural lanaguage processing
 
Introduction to Transformer Model
Introduction to Transformer ModelIntroduction to Transformer Model
Introduction to Transformer Model
 
Post Quantum Cryptography: Technical Overview
Post Quantum Cryptography: Technical OverviewPost Quantum Cryptography: Technical Overview
Post Quantum Cryptography: Technical Overview
 
Long Short Term Memory
Long Short Term MemoryLong Short Term Memory
Long Short Term Memory
 
Text MIning
Text MIningText MIning
Text MIning
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)
 
Recurrent neural networks
Recurrent neural networksRecurrent neural networks
Recurrent neural networks
 
IOT DATA AND BIG DATA
IOT DATA AND BIG DATAIOT DATA AND BIG DATA
IOT DATA AND BIG DATA
 
Autoencoder
AutoencoderAutoencoder
Autoencoder
 
Deep learning in medicine: An introduction and applications to next-generatio...
Deep learning in medicine: An introduction and applications to next-generatio...Deep learning in medicine: An introduction and applications to next-generatio...
Deep learning in medicine: An introduction and applications to next-generatio...
 
Machine Learning ppt.pptx
Machine Learning ppt.pptxMachine Learning ppt.pptx
Machine Learning ppt.pptx
 
Disease Prediction And Doctor Appointment system
Disease Prediction And Doctor Appointment  systemDisease Prediction And Doctor Appointment  system
Disease Prediction And Doctor Appointment system
 

Similar to Natural Language Understanding in Healthcare

State of the Art Natural Language Processing at Scale with Alexander Thomas a...
State of the Art Natural Language Processing at Scale with Alexander Thomas a...State of the Art Natural Language Processing at Scale with Alexander Thomas a...
State of the Art Natural Language Processing at Scale with Alexander Thomas a...Databricks
 
Deep learning for natural language understanding
Deep learning for natural language understandingDeep learning for natural language understanding
Deep learning for natural language understandingDavid Talby
 
Applying NLP to Personalized Healthcare - 2021
Applying NLP to Personalized Healthcare - 2021Applying NLP to Personalized Healthcare - 2021
Applying NLP to Personalized Healthcare - 2021David Talby
 
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...Databricks
 
NLP tutorial at AIME 2020
NLP tutorial at AIME 2020NLP tutorial at AIME 2020
NLP tutorial at AIME 2020Rui Zhang
 
The Translational Medicine
The Translational MedicineThe Translational Medicine
The Translational MedicineJoanne Luciano
 
Recent Advances in Deep Learning Techniques for Electronic Health Record
Recent Advances in Deep Learning Techniques for Electronic Health RecordRecent Advances in Deep Learning Techniques for Electronic Health Record
Recent Advances in Deep Learning Techniques for Electronic Health Recordkingstdio
 
Apache Spark NLP: Extending Spark ML to Deliver Fast, Scalable & Unified Nat...
 Apache Spark NLP: Extending Spark ML to Deliver Fast, Scalable & Unified Nat... Apache Spark NLP: Extending Spark ML to Deliver Fast, Scalable & Unified Nat...
Apache Spark NLP: Extending Spark ML to Deliver Fast, Scalable & Unified Nat...Databricks
 
What if we never agree on a common health information model?
What if we never agree on a common health information model?What if we never agree on a common health information model?
What if we never agree on a common health information model?Koray Atalag
 
Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...
Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...
Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...RussellHanson
 
2011 12 08 - LOINC Introduction
2011 12 08 - LOINC Introduction2011 12 08 - LOINC Introduction
2011 12 08 - LOINC Introductiondvreeman
 
Data Mining in Rediology reports
Data Mining in Rediology reportsData Mining in Rediology reports
Data Mining in Rediology reportsSaeed Mehrabi
 
Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...
Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...
Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...Remedy Informatics
 
New Frontiers in Applied NLP​ - PAW Healthcare 2022
New Frontiers in Applied NLP​ - PAW Healthcare 2022New Frontiers in Applied NLP​ - PAW Healthcare 2022
New Frontiers in Applied NLP​ - PAW Healthcare 2022David Talby
 
How to Apply NLP to Analyze Clinical Trials
How to Apply NLP to Analyze Clinical TrialsHow to Apply NLP to Analyze Clinical Trials
How to Apply NLP to Analyze Clinical TrialsDavid Talby
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshopGenomeInABottle
 
Qualitative analysis boot camp final presentation slides
Qualitative analysis boot camp final presentation slidesQualitative analysis boot camp final presentation slides
Qualitative analysis boot camp final presentation slidesAlexandra Howson MA, PhD, CHCP
 
Qualitative analysis boot camp final presentation slides
Qualitative analysis boot camp final presentation slidesQualitative analysis boot camp final presentation slides
Qualitative analysis boot camp final presentation slidesAlexandra Howson MA, PhD, CHCP
 
How deep learning reshapes medicine
How deep learning reshapes medicineHow deep learning reshapes medicine
How deep learning reshapes medicineHongyoon Choi
 

Similar to Natural Language Understanding in Healthcare (20)

State of the Art Natural Language Processing at Scale with Alexander Thomas a...
State of the Art Natural Language Processing at Scale with Alexander Thomas a...State of the Art Natural Language Processing at Scale with Alexander Thomas a...
State of the Art Natural Language Processing at Scale with Alexander Thomas a...
 
Deep learning for natural language understanding
Deep learning for natural language understandingDeep learning for natural language understanding
Deep learning for natural language understanding
 
Applying NLP to Personalized Healthcare - 2021
Applying NLP to Personalized Healthcare - 2021Applying NLP to Personalized Healthcare - 2021
Applying NLP to Personalized Healthcare - 2021
 
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...
 
NLP tutorial at AIME 2020
NLP tutorial at AIME 2020NLP tutorial at AIME 2020
NLP tutorial at AIME 2020
 
The Translational Medicine
The Translational MedicineThe Translational Medicine
The Translational Medicine
 
Recent Advances in Deep Learning Techniques for Electronic Health Record
Recent Advances in Deep Learning Techniques for Electronic Health RecordRecent Advances in Deep Learning Techniques for Electronic Health Record
Recent Advances in Deep Learning Techniques for Electronic Health Record
 
Apache Spark NLP: Extending Spark ML to Deliver Fast, Scalable & Unified Nat...
 Apache Spark NLP: Extending Spark ML to Deliver Fast, Scalable & Unified Nat... Apache Spark NLP: Extending Spark ML to Deliver Fast, Scalable & Unified Nat...
Apache Spark NLP: Extending Spark ML to Deliver Fast, Scalable & Unified Nat...
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
What if we never agree on a common health information model?
What if we never agree on a common health information model?What if we never agree on a common health information model?
What if we never agree on a common health information model?
 
Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...
Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...
Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...
 
2011 12 08 - LOINC Introduction
2011 12 08 - LOINC Introduction2011 12 08 - LOINC Introduction
2011 12 08 - LOINC Introduction
 
Data Mining in Rediology reports
Data Mining in Rediology reportsData Mining in Rediology reports
Data Mining in Rediology reports
 
Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...
Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...
Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...
 
New Frontiers in Applied NLP​ - PAW Healthcare 2022
New Frontiers in Applied NLP​ - PAW Healthcare 2022New Frontiers in Applied NLP​ - PAW Healthcare 2022
New Frontiers in Applied NLP​ - PAW Healthcare 2022
 
How to Apply NLP to Analyze Clinical Trials
How to Apply NLP to Analyze Clinical TrialsHow to Apply NLP to Analyze Clinical Trials
How to Apply NLP to Analyze Clinical Trials
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
Qualitative analysis boot camp final presentation slides
Qualitative analysis boot camp final presentation slidesQualitative analysis boot camp final presentation slides
Qualitative analysis boot camp final presentation slides
 
Qualitative analysis boot camp final presentation slides
Qualitative analysis boot camp final presentation slidesQualitative analysis boot camp final presentation slides
Qualitative analysis boot camp final presentation slides
 
How deep learning reshapes medicine
How deep learning reshapes medicineHow deep learning reshapes medicine
How deep learning reshapes medicine
 

More from David Talby

Building State-of-the-art Natural Language Processing Projects with Free Soft...
Building State-of-the-art Natural Language Processing Projects with Free Soft...Building State-of-the-art Natural Language Processing Projects with Free Soft...
Building State-of-the-art Natural Language Processing Projects with Free Soft...David Talby
 
Turning Medical Expert Knowledge into Responsible Language Models - K1st World
Turning Medical Expert Knowledge into Responsible Language Models - K1st WorldTurning Medical Expert Knowledge into Responsible Language Models - K1st World
Turning Medical Expert Knowledge into Responsible Language Models - K1st WorldDavid Talby
 
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...David Talby
 
Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...
Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...
Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...David Talby
 
Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 editionDavid Talby
 
Build your open source data science platform
Build your open source data science platformBuild your open source data science platform
Build your open source data science platformDavid Talby
 
Natural Language Understanding with Machine Learned Annotators and Deep Learn...
Natural Language Understanding with Machine Learned Annotators and Deep Learn...Natural Language Understanding with Machine Learned Annotators and Deep Learn...
Natural Language Understanding with Machine Learned Annotators and Deep Learn...David Talby
 
Architecting a Predictive, Petabyte-Scale, Self-Learning Fraud Detection System
Architecting a Predictive,  Petabyte-Scale, Self-Learning Fraud Detection SystemArchitecting a Predictive,  Petabyte-Scale, Self-Learning Fraud Detection System
Architecting a Predictive, Petabyte-Scale, Self-Learning Fraud Detection SystemDavid Talby
 
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...David Talby
 

More from David Talby (9)

Building State-of-the-art Natural Language Processing Projects with Free Soft...
Building State-of-the-art Natural Language Processing Projects with Free Soft...Building State-of-the-art Natural Language Processing Projects with Free Soft...
Building State-of-the-art Natural Language Processing Projects with Free Soft...
 
Turning Medical Expert Knowledge into Responsible Language Models - K1st World
Turning Medical Expert Knowledge into Responsible Language Models - K1st WorldTurning Medical Expert Knowledge into Responsible Language Models - K1st World
Turning Medical Expert Knowledge into Responsible Language Models - K1st World
 
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
 
Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...
Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...
Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...
 
Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 edition
 
Build your open source data science platform
Build your open source data science platformBuild your open source data science platform
Build your open source data science platform
 
Natural Language Understanding with Machine Learned Annotators and Deep Learn...
Natural Language Understanding with Machine Learned Annotators and Deep Learn...Natural Language Understanding with Machine Learned Annotators and Deep Learn...
Natural Language Understanding with Machine Learned Annotators and Deep Learn...
 
Architecting a Predictive, Petabyte-Scale, Self-Learning Fraud Detection System
Architecting a Predictive,  Petabyte-Scale, Self-Learning Fraud Detection SystemArchitecting a Predictive,  Petabyte-Scale, Self-Learning Fraud Detection System
Architecting a Predictive, Petabyte-Scale, Self-Learning Fraud Detection System
 
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
 

Recently uploaded

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 

Recently uploaded (20)

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 

Natural Language Understanding in Healthcare

  • 1. Dr. David Talby NATURAL LANGUAGE UNDERSTANDING IN HEALTHCARE: STATE OF THE ART NLP, MACHINE LEARNING AND DEEP LEARNING WITH OPEN SOURCE SOFTWARE
  • 2. CONTENTS 1. THE IMPORTANCE OF NLP IN HEALTHCARE AI 2. NLP IS ULTRA DOMAIN SPECIFIC 3. STATE OF THE ART RESEARCH RESULTS 4. SPARK NLP: PRODUCTION GRADE & AT SCALE
  • 3. AI VS. DOCTORS Deep Learning Computer Vision Access to Care Diagnostic Accuracy
  • 4. NLP IN HEALTHCARE Deep Learning NLP Efficiency Accuracy Radiology Diagnostic Mental Health Safety Events Inpatient Pre- Auth Key Opinion Leaders Research Meta Analysis Clinical Coding Financial Anti- Fraud Adverse Events Drug Development Recruit for Trials
  • 5. RISK PREDICTION CASE STUDY: DETECTING SEPSIS “Compared to previous work that only used structured data such as vital signs and demographic information, utilizing free text drastically improves the discriminatory ability (increase in AUC from 0.67 to 0.86) of identifying infection.”
  • 6. COHORT SELECTION CASE STUDY: ONCOLOGY “Using the combination of structured and unstructured data, 8324 patients were identified as having advanced NSCLC. Of these patients, only 2472 were also in the cohort generated using structured data only. Further, 1090 patients would be included in the structured data only cohort who should have been excluded based on additional data.”
  • 7. CONTENTS 1. THE IMPORTANCE OF NLP IN HEALTHCARE AI 2. NLP IS ULTRA DOMAIN SPECIFIC 3. STATE OF THE ART RESEARCH RESULTS 4. SPARK NLP: PRODUCTION GRADE & AT SCALE
  • 8. WHAT MAKES LANGUAGE HARD • Nuanced – Sure / I agree / Absolutely! / Whatever / Yes sir / Just to see you smile ❤️ • Fuzzy – Blue, New, Tall, Child, Tell, Do • Contextual – “Patient denies alcohol abuse” • Medium specific – “SGTM c u in 15” • Domain specific – All forward-looking statements included in this document are based on information available to us on the date hereof, and we assume no obligation to revise or publicly release any revision to any such forward-looking statement, except as may otherwise be required by law.
  • 9. We all speak many languages. THE FIRST RULE OF NLP:
  • 10. ED Triage Notes states started last night, upper abd, took alka seltzer approx 0500, no relief. nausea no vomiting Since yeatreday 10/10 "constant Tylenol 1 hr ago. +nausea. diaphoretic. Mid abd radiates to back Generalized abd radiating to lower x 3 days accompanied by dark stools. Now with bloody stool this am. Denies dizzy, sob, fatigue. Visiting from Japan on business.” Features Type of Pain Intensity of Pain Body part of region Symptoms Onset of symptoms Attempted home remedy EMERGENCY ROOM LANGUAGE
  • 11. Different Vocabulary Different Grammar Different Context Different Meaning Different Language Models Tokenizer Normalizer Lemmatizer Fact Extraction Part of Speech Tagger Spell Checker Coreference Resolution Dependency Parser Sentence Splitting Negation Detection Named Entity Recognition Sentiment Analysis Intent Classification Summarization Word Embeddings Emotion Detection Question Answering Relevance Ranking Best Next Action Translation
  • 12. CONTENTS 1. THE IMPORTANCE OF NLP IN HEALTHCARE AI 2. NLP IS ULTRA DOMAIN SPECIFIC 3. STATE OF THE ART RESEARCH RESULTS 4. SPARK NLP: PRODUCTION GRADE & AT SCALE
  • 14. “DEEP LEARNED” NER “Entity Recognition from Clinical Texts via Recurrent Neural Network”. Liu et al., BMC Medical Informatics & Decision Making, July 2017. F-Score Dataset Task 85.81% 2010 i2b2 Medical concept extraction 92.29% 2012 i2b2 Clinical event detection 94.37% 2014 i2b2 De-identification
  • 16. “DEEP LEARNED” ENTITY RESOLUTION “CNN-based ranking for biomedical entity normalization”. Li et al., BMC Bioinformatics, October 2017. F-Score Dataset Task 90.30% ShARe / CLEF Disease & problem norm. 92.29% NCBI Disease norm. in literature
  • 17. ASSERTION STATUS DETECTION Prescribing sick days due to diagnosis of influenza. Positive Jane complains about flu-like symptoms. Speculative Jane’s RIDT came back clean. Negative Jane is at risk for flu if she’s not vaccinated. Conditional Jane’s older brother had the flu last month. Family history Jane had a severe case of flu last year. Patient history
  • 18. “DEEP LEARNED” ASSERTION STATUS DETECTION “Improving Classification of Medical Assertions in Clinical Notes“ Kim et al., In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 2011. Dataset Metric 94.17% 4th i2b2/VA Mirco-averaged F1 79.76% Marco-averaged F1
  • 20. BIOMEDICAL WORD EMBEDDINGS “How to Train Good Word Embeddings for Biomedical NLP”. Kim et al., In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 2011. • Intrinsic measures (similarity between concepts) • Extrinsic measures (improving NER) • Hyperparameter optimization • Sub-sampling • Minimum-count • Learning rate • Vector dimension • Context window size • Available under an open license here
  • 21. CONTENTS 1. THE IMPORTANCE OF NLP IN HEALTHCARE AI 2. NLP IS ULTRA DOMAIN SPECIFIC 3. STATE OF THE ART RESEARCH RESULTS 4. SPARK NLP: PRODUCTION GRADE & AT SCALE
  • 22. NLP FOR APACHE SPARK Data Sources API Spark Core API (RDD’s, Project Tungsten) Spark SQL API (DataFrame, Catalyst Optimizer) Spark ML API (Pipeline, Transformer, Estimator) Part of Speech Tagger Named Entity Recognition Sentiment Analysis Spell Checker Tokenizer Stemmer Lemmatizer Entity Extraction Topic Modeling Word2Vec TF-IDF String distance calculation N-grams calculation Stop word removal Train/Test & Cross-Validate Ensembles High Performance Natural Language Understanding at Scale Design Goals • State of the art Performance & Scale • Frictionless Reuse • Enterprise Grade Built on the Spark ML API’s Apache 2.0 Licensed Active development & support
  • 23. NLP FOR APACHE SPARK: COMBINED NLP & ML PIPELINES pipeline = pyspark.ml.Pipeline(stages=[ document_assembler, tokenizer, stemmer, normalizer, stopword_remover, tf, idf, lda]) topic_model = pipeline.fit(df) Spark NLP annotators Spark ML featurizers Spark ML LDA implementation Single execution plan for whole pipeline
  • 24. PERFORMANCE BENCHMARKS • Training was 80x faster to train on 2.6MB • Training was 38x faster on 100k • Training on 100k & 2.6MB took roughly the same • Additional near-linear speedup on a cluster • Prediction was 1.6x faster on 75MB • Prediction was 1.4x faster on 15MB • Adding NLP stages takes roughly the same • Additional near-linear speedup on a cluster
  • 25. HEALTHCARE EXTENSIONS Data Sources API Spark Core API (RDD’s, Project Tungsten) Spark SQL API (DataFrame, Catalyst Optimizer) Spark ML API (Pipeline, Transformer, Estimator) Part of Speech Tagger Named Entity Recognition Sentiment Analysis Spell Checker Tokenizer Stemmer Lemmatizer Entity Extraction Topic Modeling Word2Vec TF-IDF String distance calculation N-grams calculation Stop word removal Train/Test & Cross-Validate Ensembles High Performance Natural Language Understanding at Scale com.johnsnowlabs.nlp.clinical.* Healthcare specific NLP annotators for Spark in Scala, Java or Python: • Entity Recognition • Value Extraction • Word Embeddings • Assertion Status • Sentiment Analysis • Spell Checking, … data.johnsnowlabs.com/health 1,800+ Expert curated, clean, linked, enriched & always up to date data: • Terminology • Providers • Demographics • Clinical Guidelines • Genes • Measures, …
  • 26. CASE STUDY: DEMAND FORECASTING OF ADMISSIONS FROM ED Features from Structured Data • How many patients will be admitted today? • Data Source: EPIC Clarity data Reason for visit Age Gender Vital signs Current wait time Number of orders Admit in past 30 days Type of insurance
  • 27. CASE STUDY: DEMAND FORECASTING OF ADMISSION FROM ED Features from Natural Language Text • A majority of the rich relevant content lies in unstructured notes that are contributed by doctors and nurses from patient interactions. • Data Source: Emergency Department Triage notes and other ED notes Type of Pain Intensity of Pain Body part of region Symptoms Onset of symptoms Attempted home remedy Accuracy Baseline: Human manual prediction ML with structured data ML with NLP
  • 28. USING SPARK NLP • Homepage: https://nlp.johnsnowlabs.com – Getting Started, Documentation, Examples, Videos, Blogs – Join the Slack Community • GitHub: https://github.com/johnsnowlabs/spark-nlp – Open Issues & Feature Requests – Contribute! • The library has Scala and Python 2 & 3 API’s • Get directly from maven-central or spark-packages • Tested on all Spark 2.x versions

Editor's Notes

  1. There is not one “language” – every vertical and communication channel has its own jargon that includes vocabulary, grammar, assumptions and semantics. For example – in these ED triage notes, none of the sentences is in valid English, and the words “patient” and “pain” do not appear.