SlideShare a Scribd company logo
Spark NLP for Healthcare
Lessons Learned Building Real-World
Healthcare AI Systems
Veysel Kocaman
Sr. Data Scientist
Agenda
▪ Introducing Spark NLP
▪ Problem areas in healthcare
analytics
▪ Solving healthcare related NLP
problems
▪ Case studies
Introducing Spark NLP
● Natural Language Toolkit (NLTK): The complete toolkit
for all NLP techniques.
● TextBlob: Easy to use NLP tools API, built on top of NLTK
and Pattern.
● SpaCy: Industrial strength NLP with Python and Cython.
● Gensim: Topic Modelling for Humans
● Stanford Core NLP: NLP services and packages by
Stanford NLP Group.
● Fasttext: NLP library by Facebook’s AI Research (FAIR)
lab
● ...
● Spark NLP is an open-source natural language
processing library, built on top of Apache Spark and
Spark ML. (initial release: Oct 2017)
○ A single unified solution for all your NLP needs
○ Take advantage of transfer learning and
implementing the latest and greatest SOTA
algorithms and models in NLP research
○ Lack of any NLP library that’s fully supported by
Spark
○ Delivering a mission-critical, enterprise grade NLP
library (used by multiple Fortune 500)
○ Full-time development team (26 new releases in
2018. 30 new releases in 2019.)
https://medium.com/spark-nlp/introduction-to-spark-nlp-foundations-and-basic-components-part-i-c83b7629ed59
Spark NLP Modules (Enterprise and Public)
Introducing Spark NLP
● Python, Java and Scala, R
● ”State of the art” means the best performing academic
peer-reviewed results
● Built on the Spark ML API’s
● Apache 2.0 Licensed
● Active development & support
● Zero code changes to scale a pipeline to any Spark
cluster
● The only open-source NLP library that is natively
distributed
● Spark provides execution planning, caching,
serialization, shuffling
Introducing Spark NLP
Sitting on the shoulders of Spark ML !
● Reusing the Spark ML Pipeline
● Unified NLP & ML pipelines
● End-to-end execution planning
● Serializable
● Distributable
● Reusing NLP Functionality
● TF-IDF calculation
● String distance calculation
● Topic modeling
● Distributed ML algorithms
Word & Sentence Embeddings
Glove
(100, 200, 300)
ELMO
(512, 1024)
BERT
(768d)
Universal Sentence Encoders
(512)
Clinical Word Embeddings
Clinical Glove
(200d)
ICDO Glove
(200d)
Bio BERT Clinical BERT
Pubmed + PMC Fine tuned Pubmed + PMC +
Discharge summaries
PubMed + ICD10
UMLS + MIMIC III
PubMed + PMC
PubMed abstracts and PMC full-text articles
https://www.nlm.nih.gov/bsd/difference.html
Introducing Spark NLP
Pipeline of annotators
Spark NLP Pretrained Pipeline
Spark is like a locomotive racing a
bicycle. The bike will win if the load
is light, it is quicker to accelerate
and more agile, but with a heavy
load the locomotive might take a
while to get up to speed, but it’s
going to be faster in the end.
LightPipelines are Spark ML pipelines converted into a single
machine but multithreaded task, becoming more than 10x times
faster for smaller amounts of data (small is relative, but 50k
sentences is roughly a good maximum).
Spark NLP Light Pipelines
Faster inference in runtime from Spark
NLP pipelines
Spark NLP in Healthcare
Spark NLP in Healthcare
Raw & unstructured dataClean & structured data Healthcare data
● Less than 50% of the structured data and less than 1% of the unstructured data is being leveraged for decision
making in companies (HBR). This is even worse in healthcare.
● NLP is ultra domain specific, so train your own models.
Spark NLP in Healthcare
"(admission): 50.4 kgn Height: 61 Inchn ICP: 7 (1 - 14) mmHgn Total In:n 3,279 mLn 911 mLn PO:n Tube feeding:n 243 mLn 237 mLn IV
Fluid:n 2,827 mLn 624 mLn Blood products:n Total out:n 2,333 mLn 370 mLn Urine:n 2,330 mLn 370 mLn NG:n Stool:n
Drains:n 3 mLn Balance:n 946 mLn 541 mLn Respiratory supportn O2 Delivery Device: Nonen SPO2: 97%n ABG: ///26/n Physical
Examinationn General Appearance: No acute distress, Non communicative due ton language barriern HEENT: PERRL, EOMIn Cardiovascular:
(Rhythm: Regular)n Respiratory / Chest: (Expansion: Symmetric), (Breath Sounds: CTAn bilateral : ), (Sternum: Stable )n Abdominal: Soft, Non-
distended, Non-tender, Bowel sounds presentn Left Extremities: (Edema: Absent), (Temperature: Warm), (Pulse -n Dorsalis pedis: Present),
(Pulse - Posterior tibial: Present)n Right Extremities: (Edema: Absent), (Temperature: Warm), (Pulse -n Dorsalis pedis: Present), (Pulse - Posterior
tibial: Present)n Skin: (Incision: Clean / Dry / Intact)n Neurologic: (Awake / Alert / Oriented: x 2), Follows simple commands,n Moves all
extremities, Limited due to language barriern Labs / Radiologyn 275 K/uLn 9.8 g/dLn 134 mg/dLn 0.4 mg/dLn 26 mEq/Ln 3.5 mEq/Ln 15
mg/dLn 102 mEq/Ln 137 mEq/Ln 30.3 %n 8.8 K/uLn [image002.jpg]n [**2140-7-23**] 03:30 PMn [**2140-7-24**] 02:51 AMn [**2140-7-
24**] 03:03 AMn [**2140-7-24**] 08:13 AMn [**2140-7-24**] 10:07 AMn [**2140-7-25**] 02:45 AMn [**2140-7-26**] 01:15 AMn [**2140-7-27**]
03:09 AMn [**2140-7-27**] 10:58 AMn [**2140-7-28**] 02:58 AMn WBCn 9.7n 10.3n 11.2n 7.7n 7.1n 8.8n Hctn 31.8n 32.6n 34.3n
33.3n 31.4n 30.3n Pltn [**Telephone/Fax (3) 8785**]n Creatininen 0.5n 0.5n 0.5n 0.5n 0.5n 0.5n 0.4n TCO2n 26n 28n 29n
Glucosen 168n 253n 147n 180n 92n 160n 194n 134n Other labs: PT / PTT / INR:11.6/25.8/1.0, CK / CK-MB / Troponinn T:54//<0.01, ALT
/ AST:25/32, Alk-Phos / T bili:87/,n Differential-Neuts:93.0 %, Lymph:5.3 %, Mono:1.0 %, Eos:0.5 %, Lacticn Acid:1.5 mmol/L, Ca:7.9 mg/dL,
Mg:1.8 mg/dL, PO4:2.5 mg/dLn Assessment and Plann AIRWAY, INABILITY TO PROTECT (RISK FOR ASPIRATION, ALTERED GAG, AIRWAYn
CLEARANCE, COUGH), CVA (STROKE, CEREBRAL INFARCTION), HEMORRHAGIC ,n HYPERTENSION, BENIGN, [**Last Name 12**] PROBLEM - ENTER
DESCRIPTION IN COMMENTSn Assessment and Plan: 69 yo F w/ left cerebellar thrombotic stroke,n hemorrhage, transtentorial herniation s/p EVD
placement, surgicaln decompression on [**7-22**], now w/ improved neuro examsn Neurologic: ICP monitor, Pain controlled, s/p crani for
cerebellarn CVA, moves all 4, EVD clamped.
Output from one of the NLP libraries - MIMIC-III dataset
(an openly available dataset developed by the MIT Lab for Computational Physiology)
Spark NLP in Healthcare
Spark NLP in Healthcare
Spark NLP in Healthcare
NLP Library / Feature State of the Art (SOTA) Research
Named Entity Recognition “Entity Recognition from Clinical Texts via Recurrent Neural Network”.
Liu et al., BMC Medical Informatics & Decision Making, July 2017.
Word Embeddings - “How to Train Good Word Embeddings for Biomedical NLP”.
Chiu et al., In Proceedings of BioNLP’16, August 2016.
- “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”.
Devlin et. al. (Google Research), October 2018.
Assertion Status Detection - “Improving Classification of Medical Assertions in Clinical Notes”.
Kim et al., In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics:
Human Language Technologies, 2011.
- “Neural Networks For Negation Scope Detection“
Fancellu et al., In Proceedings of the 54th Annual Meeting of the Association for Computational
Linguistics, 2016.
Entity Resolution “CNN-based ranking for biomedical entity normalization”.
Li et al., BMC Bioinformatics, October 2017.
Clinical Named Entity Recognition
Posology NER
Anatomy NER
PHI NER
Clinical NER
NER
Comparison
with AWS
Medical
Comprehend
Clinical Named Entity Recognition
Clinical Assertion Model
Prescribing sick days due to diagnosis of influenza. Present
41 yo man with CRFs of DM Type II, high cholesterol, smoking history,
family hx, HTN p/w episodes of atypical CP x 1 week, with rest and
exertion.
Conditional
Jane’s RIDT came back clean. Absent
Jane is at risk for flu if she’s not vaccinated. Hypothetical
There was a dense hemianopsia on the left side. Present
“Neural Networks For Negation Scope Detection“
Fancellu et al., In Proceedings of the 54th Annual
Meeting of the Association for Computational
Linguistics, 2016.
scope of negation: given a negative instance, to identify which tokens are affected by negation
Clinical Assertion Model
scope of negation: given a negative instance, to identify which tokens are
affected by negation
Clinical Deidentification Model
* Identifies potential pieces of content with personal information about patients and remove them by replacing with semantic tags.
Entity Resolvers Model
Entity Resolvers Model
Entity Resolvers Model
Customer Case Studies
1. How SelectData uses AI to better
understand home health patients
2. How Roche automated knowledge
extraction from pathology and radiology
reports
3. Improving patient flow forecasting at
Kaiser Permanente
4. How Deep6 accelerates clinical trial
recruitment
SelectData
What is Home Health and upcoming problems ?
Silver Tsunami
● By 2022 more than 25 percent of US workers will be 55 or older
● Nearly 10,000 baby boomers reach retirement age each day
● Home Health is expected to grow by 6.7% next year
Expert Reviewer
● Bureau of Labor Statistics projects that the need for medical coders will
increased by 15% by 2027
● Healthcare Data is used in decision-making
Aging Baby Boomers
● By 2039 the rate of Medicare spending and net interest on national debt will
exceed total projected revenues
● Payment reform focused on reduction in price
SelectData
Problems vs Solutions
TL;DR => we have more people, less qualified workers, and our clients are
receiving less money for the care of that patient.
SelectData
● OCR is difficult, different layouts, different
scales, noise, rotation.
● High number of records and pages.
● Need for cluster processing.
● Cluster processing is difficult.
SelectData Spark OCR
SelectData
● We create a pipeline, composed by annotators.
● The pipeline runs in a cluster.
● We can process many documents in parallel and scale out.
SelectData
SelectData
Document Assembler and Tokenizer
SelectData
Spell Checker
SelectData
SelectData
SelectData
Entity Resolution
Case 2: Roche
Manual curation is extremely time consuming, expensive,
and prone to errors
Manually Curated TCGA Report
Sample Results from Curation
Case 2: Roche
1. Natural Language Processing (NLP):
● High accuracy
● Specialized for medical data
● Minimize time to train new models
● Extensible for new content types
1. Optical Character Recognition (OCR):
● High accuracy
● Retain document structure
(i.e. tables, lists, paragraphs,...)
Requirements for both:
● Scalable (support 10 million pathology reports per
year)
● Compliant with privacy laws
● Integrates easily with AWS services
● Low cost
The NAVIFY team identified two significant needs
Action Plan :
● Initial goal of speeding up review of pathology
reports
● Will then automate extraction of high confidence
entities and relationships
● Will keep increasing automation of NLP over time
Case 2: Roche How Spark NLP helped Roche ?
Case 2: Roche
Lessons Learned
● Extracting text from domain specific PDFs/images is unpredictable
● Quantitative evaluation of OCR is challenging
● Bridging the gap between domain knowledge & NLP requires consensus
● Evidence does not always match with standard terminologies
● Building NLP pipelines - that are generalizable:
○ Static components like tokenization, sentence detection, POS tagging and chunking can be
re-utilized
○ Data sources (hospitals) differ, NLP approach needs to be plug and play
Case 3: Kaiser Permanente
Improving Patient Flow Forecasting
Case 3: Kaiser Permanente
Improving Patient Flow Forecasting
Objectives
Optimize the patient flow models & provide insights,
for real-time decision-making and for strategic planning,
by predicting:
● Bed demand
● 'Safe' staffing levels
● Hospital gridlock
Case 3: Kaiser Permanente
Case 4: Deep6
Feature engineering with Spark NLP to accelerate clinical trial recruitment
(reducing the time that it takes to find a patient for trials)
● Your treatments are > 15 years old
● Cutting edge treatments only
available in clinical trials
● Faster cycles make lifesaving
treatments available sooner
Case 4: Deep6
Case 4: Deep6
Case 4: Deep6
Case 4: Deep6
Case 4: Deep6
Spark NLP resources
Spark NLP Official page
Spark NLP Workshop Repo
JSL Youtube channel
JSL Blogs
Introduction to Spark NLP: Foundations and Basic Components (Part-I)
Introduction to: Spark NLP: Installation and Getting Started (Part-II)
Named Entity Recognition with Bert in Spark NLP
Text Classification in Spark NLP with Bert and Universal Sentence Encoders
Spark NLP 101 : Document Assembler
Spark NLP 101: LightPipeline
https://www.oreilly.com/radar/one-simple-chart-who-is-interested-in-spark-nlp/
https://blog.dominodatalab.com/comparing-the-functionality-of-open-source-natural-language-processing-libraries/
https://databricks.com/blog/2017/10/19/introducing-natural-language-processing-library-apache-spark.html
https://databricks.com/fr/session/apache-spark-nlp-extending-spark-ml-to-deliver-fast-scalable-unified-natural-language-processing
https://medium.com/@saif1988/spark-nlp-walkthrough-powered-by-tensorflow-9965538663fd
https://www.kdnuggets.com/2019/06/spark-nlp-getting-started-with-worlds-most-widely-used-nlp-library-enterprise.html
https://www.forbes.com/sites/forbestechcouncil/2019/09/17/winning-in-health-care-ai-with-small-data/#1b2fc2555664
https://medium.com/hackernoon/mueller-report-for-nerds-spark-meets-nlp-with-tensorflow-and-bert-part-1-32490a8f8f12
https://www.analyticsindiamag.com/5-reasons-why-spark-nlp-is-the-most-widely-used-library-in-enterprises/
https://www.oreilly.com/ideas/comparing-production-grade-nlp-libraries-training-spark-nlp-and-spacy-pipelines
https://www.oreilly.com/ideas/comparing-production-grade-nlp-libraries-accuracy-performance-and-scalability
https://www.infoworld.com/article/3031690/analytics/why-you-should-use-spark-for-machine-learning.html
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthcare AI Systems

More Related Content

What's hot

Top Rumors About Apple March 21 Big Event
Top Rumors About Apple March 21 Big EventTop Rumors About Apple March 21 Big Event
Top Rumors About Apple March 21 Big Event
ChromeInfo Technologies
 
COVID-19 Rapid Response Checklist for Nonprofits
COVID-19 Rapid Response Checklist for NonprofitsCOVID-19 Rapid Response Checklist for Nonprofits
COVID-19 Rapid Response Checklist for Nonprofits
Boston Consulting Group
 
Developing an Intranet Strategy
Developing an Intranet StrategyDeveloping an Intranet Strategy
Developing an Intranet Strategy
DNN
 
Here’s The Deck Andy Raskin Called “The Greatest Sales Pitch I’ve Seen All Year”
Here’s The Deck Andy Raskin Called “The Greatest Sales Pitch I’ve Seen All Year”Here’s The Deck Andy Raskin Called “The Greatest Sales Pitch I’ve Seen All Year”
Here’s The Deck Andy Raskin Called “The Greatest Sales Pitch I’ve Seen All Year”
Drift
 
Q3 2022 DBX Investor Presentation.pdf
Q3 2022 DBX Investor Presentation.pdfQ3 2022 DBX Investor Presentation.pdf
Q3 2022 DBX Investor Presentation.pdf
Dropbox
 
Modern Marketing Data Capabilities.docx
Modern Marketing Data Capabilities.docxModern Marketing Data Capabilities.docx
Modern Marketing Data Capabilities.docx
SasikalaKumaravel2
 
Bcg true luxury global cons insight 2017 - presentata
Bcg true luxury global cons insight 2017 - presentataBcg true luxury global cons insight 2017 - presentata
Bcg true luxury global cons insight 2017 - presentata
Gabriela Otto
 
Solve for X with AI: a VC view of the Machine Learning & AI landscape
Solve for X with AI: a VC view of the Machine Learning & AI landscapeSolve for X with AI: a VC view of the Machine Learning & AI landscape
Solve for X with AI: a VC view of the Machine Learning & AI landscape
Ed Fernandez
 
Tim Daines, QuantumBlack
Tim Daines, QuantumBlackTim Daines, QuantumBlack
Tim Daines, QuantumBlack
Mad*Pow
 
원격의료 시대의 디지털 치료제
원격의료 시대의 디지털 치료제원격의료 시대의 디지털 치료제
원격의료 시대의 디지털 치료제
Yoon Sup Choi
 
The State of Global AI Adoption in 2023
The State of Global AI Adoption in 2023The State of Global AI Adoption in 2023
The State of Global AI Adoption in 2023
InData Labs
 
Using AI for Learning.pptx
Using AI for Learning.pptxUsing AI for Learning.pptx
Using AI for Learning.pptx
GDSCUOWMKDUPG
 
SXSW 2016 takeaways
SXSW 2016 takeawaysSXSW 2016 takeaways
SXSW 2016 takeaways
Havas
 
Accenture Tech Vision 2020 - Trend 3
Accenture Tech Vision 2020 - Trend 3Accenture Tech Vision 2020 - Trend 3
Accenture Tech Vision 2020 - Trend 3
accenture
 
Brazil Digital Report - 1st Edition By McKinsey & Company and Brazil at Silic...
Brazil Digital Report - 1st Edition By McKinsey & Company and Brazil at Silic...Brazil Digital Report - 1st Edition By McKinsey & Company and Brazil at Silic...
Brazil Digital Report - 1st Edition By McKinsey & Company and Brazil at Silic...
Ana Lucia Amaral
 
How a Strong Brand Boosts B2B Demand
How a Strong Brand Boosts B2B DemandHow a Strong Brand Boosts B2B Demand
How a Strong Brand Boosts B2B Demand
GYK Antler
 
The Five Biggest Tech Trends Transforming Government In 2022
The Five Biggest Tech Trends Transforming Government In 2022The Five Biggest Tech Trends Transforming Government In 2022
The Five Biggest Tech Trends Transforming Government In 2022
Bernard Marr
 
AI in healthcare - SF Bay ACM chapter
AI in healthcare - SF Bay ACM chapterAI in healthcare - SF Bay ACM chapter
AI in healthcare - SF Bay ACM chapter
Alex Ermolaev
 
"How Knowledge Management Promotes Organizational Excellence and Success" Web...
"How Knowledge Management Promotes Organizational Excellence and Success" Web..."How Knowledge Management Promotes Organizational Excellence and Success" Web...
"How Knowledge Management Promotes Organizational Excellence and Success" Web...
Naseej Academy أكاديمية نسيج
 

What's hot (20)

Top Rumors About Apple March 21 Big Event
Top Rumors About Apple March 21 Big EventTop Rumors About Apple March 21 Big Event
Top Rumors About Apple March 21 Big Event
 
Binpress
BinpressBinpress
Binpress
 
COVID-19 Rapid Response Checklist for Nonprofits
COVID-19 Rapid Response Checklist for NonprofitsCOVID-19 Rapid Response Checklist for Nonprofits
COVID-19 Rapid Response Checklist for Nonprofits
 
Developing an Intranet Strategy
Developing an Intranet StrategyDeveloping an Intranet Strategy
Developing an Intranet Strategy
 
Here’s The Deck Andy Raskin Called “The Greatest Sales Pitch I’ve Seen All Year”
Here’s The Deck Andy Raskin Called “The Greatest Sales Pitch I’ve Seen All Year”Here’s The Deck Andy Raskin Called “The Greatest Sales Pitch I’ve Seen All Year”
Here’s The Deck Andy Raskin Called “The Greatest Sales Pitch I’ve Seen All Year”
 
Q3 2022 DBX Investor Presentation.pdf
Q3 2022 DBX Investor Presentation.pdfQ3 2022 DBX Investor Presentation.pdf
Q3 2022 DBX Investor Presentation.pdf
 
Modern Marketing Data Capabilities.docx
Modern Marketing Data Capabilities.docxModern Marketing Data Capabilities.docx
Modern Marketing Data Capabilities.docx
 
Bcg true luxury global cons insight 2017 - presentata
Bcg true luxury global cons insight 2017 - presentataBcg true luxury global cons insight 2017 - presentata
Bcg true luxury global cons insight 2017 - presentata
 
Solve for X with AI: a VC view of the Machine Learning & AI landscape
Solve for X with AI: a VC view of the Machine Learning & AI landscapeSolve for X with AI: a VC view of the Machine Learning & AI landscape
Solve for X with AI: a VC view of the Machine Learning & AI landscape
 
Tim Daines, QuantumBlack
Tim Daines, QuantumBlackTim Daines, QuantumBlack
Tim Daines, QuantumBlack
 
원격의료 시대의 디지털 치료제
원격의료 시대의 디지털 치료제원격의료 시대의 디지털 치료제
원격의료 시대의 디지털 치료제
 
The State of Global AI Adoption in 2023
The State of Global AI Adoption in 2023The State of Global AI Adoption in 2023
The State of Global AI Adoption in 2023
 
Using AI for Learning.pptx
Using AI for Learning.pptxUsing AI for Learning.pptx
Using AI for Learning.pptx
 
SXSW 2016 takeaways
SXSW 2016 takeawaysSXSW 2016 takeaways
SXSW 2016 takeaways
 
Accenture Tech Vision 2020 - Trend 3
Accenture Tech Vision 2020 - Trend 3Accenture Tech Vision 2020 - Trend 3
Accenture Tech Vision 2020 - Trend 3
 
Brazil Digital Report - 1st Edition By McKinsey & Company and Brazil at Silic...
Brazil Digital Report - 1st Edition By McKinsey & Company and Brazil at Silic...Brazil Digital Report - 1st Edition By McKinsey & Company and Brazil at Silic...
Brazil Digital Report - 1st Edition By McKinsey & Company and Brazil at Silic...
 
How a Strong Brand Boosts B2B Demand
How a Strong Brand Boosts B2B DemandHow a Strong Brand Boosts B2B Demand
How a Strong Brand Boosts B2B Demand
 
The Five Biggest Tech Trends Transforming Government In 2022
The Five Biggest Tech Trends Transforming Government In 2022The Five Biggest Tech Trends Transforming Government In 2022
The Five Biggest Tech Trends Transforming Government In 2022
 
AI in healthcare - SF Bay ACM chapter
AI in healthcare - SF Bay ACM chapterAI in healthcare - SF Bay ACM chapter
AI in healthcare - SF Bay ACM chapter
 
"How Knowledge Management Promotes Organizational Excellence and Success" Web...
"How Knowledge Management Promotes Organizational Excellence and Success" Web..."How Knowledge Management Promotes Organizational Excellence and Success" Web...
"How Knowledge Management Promotes Organizational Excellence and Success" Web...
 

Similar to Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthcare AI Systems

Natural Language Understanding in Healthcare
Natural Language Understanding in HealthcareNatural Language Understanding in Healthcare
Natural Language Understanding in Healthcare
David Talby
 
2019 Triangle Machine Learning Day - Biomedical Image Understanding and EHRs ...
2019 Triangle Machine Learning Day - Biomedical Image Understanding and EHRs ...2019 Triangle Machine Learning Day - Biomedical Image Understanding and EHRs ...
2019 Triangle Machine Learning Day - Biomedical Image Understanding and EHRs ...
The Statistical and Applied Mathematical Sciences Institute
 
Apache Spark NLP: Extending Spark ML to Deliver Fast, Scalable & Unified Nat...
 Apache Spark NLP: Extending Spark ML to Deliver Fast, Scalable & Unified Nat... Apache Spark NLP: Extending Spark ML to Deliver Fast, Scalable & Unified Nat...
Apache Spark NLP: Extending Spark ML to Deliver Fast, Scalable & Unified Nat...
Databricks
 
How can we har­ness the Human Brain Project to max­i­mize its future health a...
How can we har­ness the Human Brain Project to max­i­mize its future health a...How can we har­ness the Human Brain Project to max­i­mize its future health a...
How can we har­ness the Human Brain Project to max­i­mize its future health a...
SharpBrains
 
Connected Data for Machine Learning | Paul Groth
Connected Data for Machine Learning | Paul GrothConnected Data for Machine Learning | Paul Groth
Connected Data for Machine Learning | Paul Groth
Connected Data World
 
ENCEPHALOGRAPHY PANKAJ.pptx
ENCEPHALOGRAPHY PANKAJ.pptxENCEPHALOGRAPHY PANKAJ.pptx
ENCEPHALOGRAPHY PANKAJ.pptx
preeminentbot
 
Computer-Aided Detection (1).pptx
Computer-Aided Detection (1).pptxComputer-Aided Detection (1).pptx
Computer-Aided Detection (1).pptx
MohammedMasliuddin
 
Non intrusive-devices
Non intrusive-devicesNon intrusive-devices
Non intrusive-devices
Unesco Telemedicine
 
Future of Natural Language Processing - Potential Lists of Topics for PhD stu...
Future of Natural Language Processing - Potential Lists of Topics for PhD stu...Future of Natural Language Processing - Potential Lists of Topics for PhD stu...
Future of Natural Language Processing - Potential Lists of Topics for PhD stu...
PhD Assistance
 
2011 12 08 - LOINC Introduction
2011 12 08 - LOINC Introduction2011 12 08 - LOINC Introduction
2011 12 08 - LOINC Introductiondvreeman
 
Natural Language Processing to Curate Unstructured Electronic Health Records
Natural Language Processing to Curate Unstructured Electronic Health RecordsNatural Language Processing to Curate Unstructured Electronic Health Records
Natural Language Processing to Curate Unstructured Electronic Health Records
MMS Holdings
 
Future of Natural Language Processing - Potential Lists of Topics for PhD stu...
Future of Natural Language Processing - Potential Lists of Topics for PhD stu...Future of Natural Language Processing - Potential Lists of Topics for PhD stu...
Future of Natural Language Processing - Potential Lists of Topics for PhD stu...
PhD Assistance
 
Reference Domain Ontologies and Large Medical Language Models.pptx
Reference Domain Ontologies and Large Medical Language Models.pptxReference Domain Ontologies and Large Medical Language Models.pptx
Reference Domain Ontologies and Large Medical Language Models.pptx
Chimezie Ogbuji
 
NC_Fall_14_web
NC_Fall_14_webNC_Fall_14_web
NC_Fall_14_webErica Kube
 
DRIVE 2017 | 25 October - THE HUMAN TOUCH - Meaningful Data & Smart Fashion
DRIVE 2017 | 25 October - THE HUMAN TOUCH - Meaningful Data & Smart FashionDRIVE 2017 | 25 October - THE HUMAN TOUCH - Meaningful Data & Smart Fashion
DRIVE 2017 | 25 October - THE HUMAN TOUCH - Meaningful Data & Smart Fashion
CLICKNL
 
[DigiHealth 22] Artificial intelligence in medicine - Kristijan Saric
[DigiHealth 22] Artificial intelligence in medicine - Kristijan Saric[DigiHealth 22] Artificial intelligence in medicine - Kristijan Saric
[DigiHealth 22] Artificial intelligence in medicine - Kristijan Saric
DataScienceConferenc1
 
Automated and Explainable Deep Learning for Clinical Language Understanding a...
Automated and Explainable Deep Learning for Clinical Language Understanding a...Automated and Explainable Deep Learning for Clinical Language Understanding a...
Automated and Explainable Deep Learning for Clinical Language Understanding a...
Databricks
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven Research
European Bioinformatics Institute
 
Cao report 2007-2012
Cao report 2007-2012Cao report 2007-2012
Cao report 2007-2012
Elif Ceylan
 
The Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in BiologyThe Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in Biology
robertstevens65
 

Similar to Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthcare AI Systems (20)

Natural Language Understanding in Healthcare
Natural Language Understanding in HealthcareNatural Language Understanding in Healthcare
Natural Language Understanding in Healthcare
 
2019 Triangle Machine Learning Day - Biomedical Image Understanding and EHRs ...
2019 Triangle Machine Learning Day - Biomedical Image Understanding and EHRs ...2019 Triangle Machine Learning Day - Biomedical Image Understanding and EHRs ...
2019 Triangle Machine Learning Day - Biomedical Image Understanding and EHRs ...
 
Apache Spark NLP: Extending Spark ML to Deliver Fast, Scalable & Unified Nat...
 Apache Spark NLP: Extending Spark ML to Deliver Fast, Scalable & Unified Nat... Apache Spark NLP: Extending Spark ML to Deliver Fast, Scalable & Unified Nat...
Apache Spark NLP: Extending Spark ML to Deliver Fast, Scalable & Unified Nat...
 
How can we har­ness the Human Brain Project to max­i­mize its future health a...
How can we har­ness the Human Brain Project to max­i­mize its future health a...How can we har­ness the Human Brain Project to max­i­mize its future health a...
How can we har­ness the Human Brain Project to max­i­mize its future health a...
 
Connected Data for Machine Learning | Paul Groth
Connected Data for Machine Learning | Paul GrothConnected Data for Machine Learning | Paul Groth
Connected Data for Machine Learning | Paul Groth
 
ENCEPHALOGRAPHY PANKAJ.pptx
ENCEPHALOGRAPHY PANKAJ.pptxENCEPHALOGRAPHY PANKAJ.pptx
ENCEPHALOGRAPHY PANKAJ.pptx
 
Computer-Aided Detection (1).pptx
Computer-Aided Detection (1).pptxComputer-Aided Detection (1).pptx
Computer-Aided Detection (1).pptx
 
Non intrusive-devices
Non intrusive-devicesNon intrusive-devices
Non intrusive-devices
 
Future of Natural Language Processing - Potential Lists of Topics for PhD stu...
Future of Natural Language Processing - Potential Lists of Topics for PhD stu...Future of Natural Language Processing - Potential Lists of Topics for PhD stu...
Future of Natural Language Processing - Potential Lists of Topics for PhD stu...
 
2011 12 08 - LOINC Introduction
2011 12 08 - LOINC Introduction2011 12 08 - LOINC Introduction
2011 12 08 - LOINC Introduction
 
Natural Language Processing to Curate Unstructured Electronic Health Records
Natural Language Processing to Curate Unstructured Electronic Health RecordsNatural Language Processing to Curate Unstructured Electronic Health Records
Natural Language Processing to Curate Unstructured Electronic Health Records
 
Future of Natural Language Processing - Potential Lists of Topics for PhD stu...
Future of Natural Language Processing - Potential Lists of Topics for PhD stu...Future of Natural Language Processing - Potential Lists of Topics for PhD stu...
Future of Natural Language Processing - Potential Lists of Topics for PhD stu...
 
Reference Domain Ontologies and Large Medical Language Models.pptx
Reference Domain Ontologies and Large Medical Language Models.pptxReference Domain Ontologies and Large Medical Language Models.pptx
Reference Domain Ontologies and Large Medical Language Models.pptx
 
NC_Fall_14_web
NC_Fall_14_webNC_Fall_14_web
NC_Fall_14_web
 
DRIVE 2017 | 25 October - THE HUMAN TOUCH - Meaningful Data & Smart Fashion
DRIVE 2017 | 25 October - THE HUMAN TOUCH - Meaningful Data & Smart FashionDRIVE 2017 | 25 October - THE HUMAN TOUCH - Meaningful Data & Smart Fashion
DRIVE 2017 | 25 October - THE HUMAN TOUCH - Meaningful Data & Smart Fashion
 
[DigiHealth 22] Artificial intelligence in medicine - Kristijan Saric
[DigiHealth 22] Artificial intelligence in medicine - Kristijan Saric[DigiHealth 22] Artificial intelligence in medicine - Kristijan Saric
[DigiHealth 22] Artificial intelligence in medicine - Kristijan Saric
 
Automated and Explainable Deep Learning for Clinical Language Understanding a...
Automated and Explainable Deep Learning for Clinical Language Understanding a...Automated and Explainable Deep Learning for Clinical Language Understanding a...
Automated and Explainable Deep Learning for Clinical Language Understanding a...
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven Research
 
Cao report 2007-2012
Cao report 2007-2012Cao report 2007-2012
Cao report 2007-2012
 
The Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in BiologyThe Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in Biology
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
AlejandraGmez176757
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 

Recently uploaded (20)

Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 

Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthcare AI Systems

  • 1.
  • 2. Spark NLP for Healthcare Lessons Learned Building Real-World Healthcare AI Systems Veysel Kocaman Sr. Data Scientist
  • 3. Agenda ▪ Introducing Spark NLP ▪ Problem areas in healthcare analytics ▪ Solving healthcare related NLP problems ▪ Case studies
  • 4. Introducing Spark NLP ● Natural Language Toolkit (NLTK): The complete toolkit for all NLP techniques. ● TextBlob: Easy to use NLP tools API, built on top of NLTK and Pattern. ● SpaCy: Industrial strength NLP with Python and Cython. ● Gensim: Topic Modelling for Humans ● Stanford Core NLP: NLP services and packages by Stanford NLP Group. ● Fasttext: NLP library by Facebook’s AI Research (FAIR) lab ● ... ● Spark NLP is an open-source natural language processing library, built on top of Apache Spark and Spark ML. (initial release: Oct 2017) ○ A single unified solution for all your NLP needs ○ Take advantage of transfer learning and implementing the latest and greatest SOTA algorithms and models in NLP research ○ Lack of any NLP library that’s fully supported by Spark ○ Delivering a mission-critical, enterprise grade NLP library (used by multiple Fortune 500) ○ Full-time development team (26 new releases in 2018. 30 new releases in 2019.) https://medium.com/spark-nlp/introduction-to-spark-nlp-foundations-and-basic-components-part-i-c83b7629ed59
  • 5. Spark NLP Modules (Enterprise and Public)
  • 6.
  • 7. Introducing Spark NLP ● Python, Java and Scala, R ● ”State of the art” means the best performing academic peer-reviewed results ● Built on the Spark ML API’s ● Apache 2.0 Licensed ● Active development & support ● Zero code changes to scale a pipeline to any Spark cluster ● The only open-source NLP library that is natively distributed ● Spark provides execution planning, caching, serialization, shuffling
  • 9. Sitting on the shoulders of Spark ML ! ● Reusing the Spark ML Pipeline ● Unified NLP & ML pipelines ● End-to-end execution planning ● Serializable ● Distributable ● Reusing NLP Functionality ● TF-IDF calculation ● String distance calculation ● Topic modeling ● Distributed ML algorithms
  • 10. Word & Sentence Embeddings Glove (100, 200, 300) ELMO (512, 1024) BERT (768d) Universal Sentence Encoders (512)
  • 11. Clinical Word Embeddings Clinical Glove (200d) ICDO Glove (200d) Bio BERT Clinical BERT Pubmed + PMC Fine tuned Pubmed + PMC + Discharge summaries PubMed + ICD10 UMLS + MIMIC III PubMed + PMC PubMed abstracts and PMC full-text articles https://www.nlm.nih.gov/bsd/difference.html
  • 14. Spark is like a locomotive racing a bicycle. The bike will win if the load is light, it is quicker to accelerate and more agile, but with a heavy load the locomotive might take a while to get up to speed, but it’s going to be faster in the end. LightPipelines are Spark ML pipelines converted into a single machine but multithreaded task, becoming more than 10x times faster for smaller amounts of data (small is relative, but 50k sentences is roughly a good maximum). Spark NLP Light Pipelines Faster inference in runtime from Spark NLP pipelines
  • 15. Spark NLP in Healthcare
  • 16. Spark NLP in Healthcare Raw & unstructured dataClean & structured data Healthcare data ● Less than 50% of the structured data and less than 1% of the unstructured data is being leveraged for decision making in companies (HBR). This is even worse in healthcare. ● NLP is ultra domain specific, so train your own models.
  • 17. Spark NLP in Healthcare
  • 18.
  • 19. "(admission): 50.4 kgn Height: 61 Inchn ICP: 7 (1 - 14) mmHgn Total In:n 3,279 mLn 911 mLn PO:n Tube feeding:n 243 mLn 237 mLn IV Fluid:n 2,827 mLn 624 mLn Blood products:n Total out:n 2,333 mLn 370 mLn Urine:n 2,330 mLn 370 mLn NG:n Stool:n Drains:n 3 mLn Balance:n 946 mLn 541 mLn Respiratory supportn O2 Delivery Device: Nonen SPO2: 97%n ABG: ///26/n Physical Examinationn General Appearance: No acute distress, Non communicative due ton language barriern HEENT: PERRL, EOMIn Cardiovascular: (Rhythm: Regular)n Respiratory / Chest: (Expansion: Symmetric), (Breath Sounds: CTAn bilateral : ), (Sternum: Stable )n Abdominal: Soft, Non- distended, Non-tender, Bowel sounds presentn Left Extremities: (Edema: Absent), (Temperature: Warm), (Pulse -n Dorsalis pedis: Present), (Pulse - Posterior tibial: Present)n Right Extremities: (Edema: Absent), (Temperature: Warm), (Pulse -n Dorsalis pedis: Present), (Pulse - Posterior tibial: Present)n Skin: (Incision: Clean / Dry / Intact)n Neurologic: (Awake / Alert / Oriented: x 2), Follows simple commands,n Moves all extremities, Limited due to language barriern Labs / Radiologyn 275 K/uLn 9.8 g/dLn 134 mg/dLn 0.4 mg/dLn 26 mEq/Ln 3.5 mEq/Ln 15 mg/dLn 102 mEq/Ln 137 mEq/Ln 30.3 %n 8.8 K/uLn [image002.jpg]n [**2140-7-23**] 03:30 PMn [**2140-7-24**] 02:51 AMn [**2140-7- 24**] 03:03 AMn [**2140-7-24**] 08:13 AMn [**2140-7-24**] 10:07 AMn [**2140-7-25**] 02:45 AMn [**2140-7-26**] 01:15 AMn [**2140-7-27**] 03:09 AMn [**2140-7-27**] 10:58 AMn [**2140-7-28**] 02:58 AMn WBCn 9.7n 10.3n 11.2n 7.7n 7.1n 8.8n Hctn 31.8n 32.6n 34.3n 33.3n 31.4n 30.3n Pltn [**Telephone/Fax (3) 8785**]n Creatininen 0.5n 0.5n 0.5n 0.5n 0.5n 0.5n 0.4n TCO2n 26n 28n 29n Glucosen 168n 253n 147n 180n 92n 160n 194n 134n Other labs: PT / PTT / INR:11.6/25.8/1.0, CK / CK-MB / Troponinn T:54//<0.01, ALT / AST:25/32, Alk-Phos / T bili:87/,n Differential-Neuts:93.0 %, Lymph:5.3 %, Mono:1.0 %, Eos:0.5 %, Lacticn Acid:1.5 mmol/L, Ca:7.9 mg/dL, Mg:1.8 mg/dL, PO4:2.5 mg/dLn Assessment and Plann AIRWAY, INABILITY TO PROTECT (RISK FOR ASPIRATION, ALTERED GAG, AIRWAYn CLEARANCE, COUGH), CVA (STROKE, CEREBRAL INFARCTION), HEMORRHAGIC ,n HYPERTENSION, BENIGN, [**Last Name 12**] PROBLEM - ENTER DESCRIPTION IN COMMENTSn Assessment and Plan: 69 yo F w/ left cerebellar thrombotic stroke,n hemorrhage, transtentorial herniation s/p EVD placement, surgicaln decompression on [**7-22**], now w/ improved neuro examsn Neurologic: ICP monitor, Pain controlled, s/p crani for cerebellarn CVA, moves all 4, EVD clamped. Output from one of the NLP libraries - MIMIC-III dataset (an openly available dataset developed by the MIT Lab for Computational Physiology) Spark NLP in Healthcare
  • 20. Spark NLP in Healthcare
  • 21. Spark NLP in Healthcare NLP Library / Feature State of the Art (SOTA) Research Named Entity Recognition “Entity Recognition from Clinical Texts via Recurrent Neural Network”. Liu et al., BMC Medical Informatics & Decision Making, July 2017. Word Embeddings - “How to Train Good Word Embeddings for Biomedical NLP”. Chiu et al., In Proceedings of BioNLP’16, August 2016. - “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”. Devlin et. al. (Google Research), October 2018. Assertion Status Detection - “Improving Classification of Medical Assertions in Clinical Notes”. Kim et al., In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 2011. - “Neural Networks For Negation Scope Detection“ Fancellu et al., In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016. Entity Resolution “CNN-based ranking for biomedical entity normalization”. Li et al., BMC Bioinformatics, October 2017.
  • 22.
  • 23. Clinical Named Entity Recognition Posology NER Anatomy NER PHI NER Clinical NER
  • 25. Clinical Assertion Model Prescribing sick days due to diagnosis of influenza. Present 41 yo man with CRFs of DM Type II, high cholesterol, smoking history, family hx, HTN p/w episodes of atypical CP x 1 week, with rest and exertion. Conditional Jane’s RIDT came back clean. Absent Jane is at risk for flu if she’s not vaccinated. Hypothetical There was a dense hemianopsia on the left side. Present “Neural Networks For Negation Scope Detection“ Fancellu et al., In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016. scope of negation: given a negative instance, to identify which tokens are affected by negation
  • 26. Clinical Assertion Model scope of negation: given a negative instance, to identify which tokens are affected by negation
  • 27. Clinical Deidentification Model * Identifies potential pieces of content with personal information about patients and remove them by replacing with semantic tags.
  • 31. Customer Case Studies 1. How SelectData uses AI to better understand home health patients 2. How Roche automated knowledge extraction from pathology and radiology reports 3. Improving patient flow forecasting at Kaiser Permanente 4. How Deep6 accelerates clinical trial recruitment
  • 32. SelectData What is Home Health and upcoming problems ? Silver Tsunami ● By 2022 more than 25 percent of US workers will be 55 or older ● Nearly 10,000 baby boomers reach retirement age each day ● Home Health is expected to grow by 6.7% next year Expert Reviewer ● Bureau of Labor Statistics projects that the need for medical coders will increased by 15% by 2027 ● Healthcare Data is used in decision-making Aging Baby Boomers ● By 2039 the rate of Medicare spending and net interest on national debt will exceed total projected revenues ● Payment reform focused on reduction in price
  • 33. SelectData Problems vs Solutions TL;DR => we have more people, less qualified workers, and our clients are receiving less money for the care of that patient.
  • 34. SelectData ● OCR is difficult, different layouts, different scales, noise, rotation. ● High number of records and pages. ● Need for cluster processing. ● Cluster processing is difficult.
  • 36. SelectData ● We create a pipeline, composed by annotators. ● The pipeline runs in a cluster. ● We can process many documents in parallel and scale out.
  • 43. Case 2: Roche Manual curation is extremely time consuming, expensive, and prone to errors Manually Curated TCGA Report Sample Results from Curation
  • 44. Case 2: Roche 1. Natural Language Processing (NLP): ● High accuracy ● Specialized for medical data ● Minimize time to train new models ● Extensible for new content types 1. Optical Character Recognition (OCR): ● High accuracy ● Retain document structure (i.e. tables, lists, paragraphs,...) Requirements for both: ● Scalable (support 10 million pathology reports per year) ● Compliant with privacy laws ● Integrates easily with AWS services ● Low cost The NAVIFY team identified two significant needs Action Plan : ● Initial goal of speeding up review of pathology reports ● Will then automate extraction of high confidence entities and relationships ● Will keep increasing automation of NLP over time
  • 45. Case 2: Roche How Spark NLP helped Roche ?
  • 46. Case 2: Roche Lessons Learned ● Extracting text from domain specific PDFs/images is unpredictable ● Quantitative evaluation of OCR is challenging ● Bridging the gap between domain knowledge & NLP requires consensus ● Evidence does not always match with standard terminologies ● Building NLP pipelines - that are generalizable: ○ Static components like tokenization, sentence detection, POS tagging and chunking can be re-utilized ○ Data sources (hospitals) differ, NLP approach needs to be plug and play
  • 47. Case 3: Kaiser Permanente Improving Patient Flow Forecasting
  • 48. Case 3: Kaiser Permanente Improving Patient Flow Forecasting Objectives Optimize the patient flow models & provide insights, for real-time decision-making and for strategic planning, by predicting: ● Bed demand ● 'Safe' staffing levels ● Hospital gridlock
  • 49. Case 3: Kaiser Permanente
  • 50. Case 4: Deep6 Feature engineering with Spark NLP to accelerate clinical trial recruitment (reducing the time that it takes to find a patient for trials) ● Your treatments are > 15 years old ● Cutting edge treatments only available in clinical trials ● Faster cycles make lifesaving treatments available sooner
  • 56. Spark NLP resources Spark NLP Official page Spark NLP Workshop Repo JSL Youtube channel JSL Blogs Introduction to Spark NLP: Foundations and Basic Components (Part-I) Introduction to: Spark NLP: Installation and Getting Started (Part-II) Named Entity Recognition with Bert in Spark NLP Text Classification in Spark NLP with Bert and Universal Sentence Encoders Spark NLP 101 : Document Assembler Spark NLP 101: LightPipeline https://www.oreilly.com/radar/one-simple-chart-who-is-interested-in-spark-nlp/ https://blog.dominodatalab.com/comparing-the-functionality-of-open-source-natural-language-processing-libraries/ https://databricks.com/blog/2017/10/19/introducing-natural-language-processing-library-apache-spark.html https://databricks.com/fr/session/apache-spark-nlp-extending-spark-ml-to-deliver-fast-scalable-unified-natural-language-processing https://medium.com/@saif1988/spark-nlp-walkthrough-powered-by-tensorflow-9965538663fd https://www.kdnuggets.com/2019/06/spark-nlp-getting-started-with-worlds-most-widely-used-nlp-library-enterprise.html https://www.forbes.com/sites/forbestechcouncil/2019/09/17/winning-in-health-care-ai-with-small-data/#1b2fc2555664 https://medium.com/hackernoon/mueller-report-for-nerds-spark-meets-nlp-with-tensorflow-and-bert-part-1-32490a8f8f12 https://www.analyticsindiamag.com/5-reasons-why-spark-nlp-is-the-most-widely-used-library-in-enterprises/ https://www.oreilly.com/ideas/comparing-production-grade-nlp-libraries-training-spark-nlp-and-spacy-pipelines https://www.oreilly.com/ideas/comparing-production-grade-nlp-libraries-accuracy-performance-and-scalability https://www.infoworld.com/article/3031690/analytics/why-you-should-use-spark-for-machine-learning.html