SlideShare a Scribd company logo
1 of 29
Natural
Language
Processing for
medical data
Dr. Anja Pilz, ML Conference 2021
About me
@anja_pilz
aplz
● PhD in machine learning & natural
language processing from University of
Bonn & Fraunhofer IAIS
● Now in industry: AI and data driven
products, since 2016 mostly in the medical
and healthcare domain
● Main interests: NLP, especially German;
information retrieval; recommender
systems
Dr. Anja Pilz, ML Conference 2021
Doctors spend more time documenting what they do than with effective treatment
● 70% of work hours dedicated to tasks not performed on the patient (orga & docs)
Important as documentation covers symptoms, risk factors, intolerances, treatments, …
● each piece of information is vital for the patient - but can be buried somewhere
Not only complex cases quickly become “unscannable”
● Use NLP for Information Extraction: automatically search, analyze, and add
structure to these unstructured texts
Swiss Medical Journal, 2016;97(1):6–8
Motivation
Dr. Anja Pilz, ML Conference 2021
Support doctor’s daily work
● create warnings from automatically detected
risks and contraindications
● summarize suspected and excluded
diagnoses (differential diagnosis)
● add hints to treatment guidelines
And much more!
Motivation
Dr. Anja Pilz, ML Conference 2021
Support billing process
● billing process is super complex and
needs to be soundproof
● help medical controllers to find
relevant information
● automatically find mentions of
diseases and treatments
● align with entries from catalogs
used for billing (e.g. ICD-10)
Motivation
Image damedic code
Dr. Anja Pilz, ML Conference 2021
NLP Tasks for Medical Data
filter relevant entities
(clinical, billing)
Entity
Recognition
(NER)
Entity Linking
(NEL/NED)
Entity Filtering
detect all relevant
mentions:
● diagnoses
● procedures
● body parts
● drugs
● measurements
● negations...
link to unique concepts:
● entries in (curated)
medical ontologies
or catalogs
● normalization used
for documentation,
summarization, &
billing
Dr. Anja Pilz, ML Conference 2021
Challenges: Medical Domain is not News
Typical medical texts are very different common NLP data
● super condensed and short, sometimes like an enumeration
● full of abbreviations, acronyms and technical terms
● ambiguity is often resolved through sheer knowledge, not necessarily by the local
context
Indication: Acute hypoxia. Relapsed AML,
GVHD, and renal failure with new hypoxia with
clear chest x-ray.
Dr. Anja Pilz, ML Conference 2021
Abbreviations are used for convenience
● ambiguous ones may cause miscommunication
● potentially jeopardise patient care
Entity Linking needs to expand acronyms but must not rely on priors
Challenges: Ambiguity
TMZ temazepam
temozolomide
Holper et al., Ambiguous medical abbreviation study:
challenges and opportunities, Intern Med J. 2020
LFT liver function test
LFT lung function test
HWI Harnwegsinfekt
Hinterwandinfarkt
BCa bladder cancer
breast cancer
VF Vorhofflimmern
Vorhofflattern
MS Magensonde
Mitralstenose
Dr. Anja Pilz, ML Conference 2021
Challenges: German
Latin origin vs German spelling results in a bunch of variations
● Carcinom, Karcinom, Carzinom, Karzinom, Ca, CA
The notorious compound words
● sensory sensation disorder: Schallempfindungsstörung
● occlusion of the central retinal artery: Netzhautarterienverschluss
● detection of Tuberculosis: Tuberkulosenachweis
Decompounding is non-trivial and requires profound linguistic knowledge
Dr. Anja Pilz, ML Conference 2021
● data is available, e.g. BC5CDR (1500 PubMed articles with annotated chemicals,
diseases & their interactions)
● trained models are available
● not “solved” but at a pretty good state of the art
Entity Recognition (EN)
https://scispacy.apps.allenai.org/
Dr. Anja Pilz, ML Conference 2021
● typical off-the-shelf models are not useful for the medical domain
● need to train domain models here
Entity Recognition (DE)
Dr. Anja Pilz, ML Conference 2021
Data?
Real patient data
● resides in hospitals and medical practices
● not publicly available
Public data
● netdoktor != Dr. B. Oss
● data in layman language does not compare well to real medical texts
● may still help
Patient: “Ich habe im
Moment keine
Blutdruckprobleme”
Doctor: “RR gut eingestellt”
Dr. Anja Pilz, ML Conference 2021
Entity Recognition
Get data. Start annotating.
● entities are all concepts of interest:
drugs, medical conditions, procedures,
body parts, …
● annotation usually requires medical
expert knowledge
● super specific vocabulary with lots of
abbreviations & acronyms
● good to go after ~1k documents
Dr. Anja Pilz, ML Conference 2021
Train your own model
Entity Recognition
+ data
Dr. Anja Pilz, ML Conference 2021
Most work in research: link entity mentions to concepts
in medical thesaurus UMLS
● higher level metadata enrichment
● index new publications by topic & keywords
● hot topic and a bunch of publications exists
Why not?
● no German version (yet)
● concepts are sometimes not specific enough
Entity Linking
Murty et al., Hierarchical Losses and New Resources for
Fine-grained Entity Typing and Linking, ACL 2018
Kolitsas et al., End-to-End Neural Entity Linking, CoNLL 2018
Mohan & Li, MedMentions: A Large Biomedical Corpus
Annotated with UMLS Concepts, AKBC 2019
Dr. Anja Pilz, ML Conference 2021
ICD-10 Linking
ICD: International Statistical Classification of
Diseases and Related Health Problems
● catalogs mental and physical disorders in
most specific and precise form
● global standard for clinical
documentation and billing
● published yearly by the WHO
https://icd.who.int/browse10/2019/en
Dr. Anja Pilz, ML Conference 2021
ICD-10 Linking
ICD: International Statistical Classification of
Diseases and Related Health Problems
● catalogs mental and physical disorders
in most specific and precise form
● global standard for clinical
documentation and billing
● published yearly by the WHO
● … comes with German modification
ICD-10-GM (BfArM)
https://www.dimdi.de/static/de/klassifikationen/icd/icd-10-gm
Dr. Anja Pilz, ML Conference 2021
Higher clinical relevance
● support doctors: can’t get much more specific than with a
diagnosis code
● support medical controllers: ICD codes are the items used in
billing, not UMLS concepts
Requires entity filtering to avoid false positives
● excluded or suspected diagnoses
● “state after diseases”: clinically but not be billing relevant
ICD-10 Linking
EHR
Keine Hinweis auf
intrazerebrale
Blutung.
Z.n. Hysterektomie,
2006
Dr. Anja Pilz, ML Conference 2021
Most mentions may be clinically relevant, but not coding relevant.
Need relation extraction approaches here..
Entity Filtering for primary coding
Prostatacarcinom in der Vorgeschichte
Vorbekannte Osteochondrose
Z.n. mehrfachem Apoplexen, zuletzt 2006
Mamma-Ca wurde ausgeschlossen.
Keine Hinweis auf intrazerebrale
Blutung.
Die BWK 9-Fraktur zeigte sich mit
fehlender knöcherner Durchbauung im
Sinne einer Pseudarthrose.
Intrazerebrale Blutung konnte
nicht bestätigt werden.
Verdacht auf arterielle Hypertonie.
Dr. Anja Pilz, ML Conference 2021
Toy example. Typical
cases are much
more complex.
Dr. Anja Pilz, ML Conference 2021
To be really useful, the link must be super specific
● “some renal failure” (N17*) is not good enough
Specificity relates to the stage of the disease
● hugely affects treatment complexity and care
intensity
● treatment complexity directly corresponds to
the hospital’s bill send to the insurance
company
ICD-10 Linking
https://www.dimdi.de/static/de/klassifikationen/icd/icd-10-gm
Dr. Anja Pilz, ML Conference 2021
Specificity
To describe a disease in a certain stage or
manifestation, the catalog is super specific
● 40 entries for different instances of
Diabetes Mellitus, Type 1 and Type 2 each
● there are even more forms of Diabetes...
Difference is sometimes only one word
● “nicht” or “mit/ohne”: usual stopwords are
dangerous here!
https://www.dimdi.de/static/de/klassifikationen/icd/icd-10-gm
Dr. Anja Pilz, ML Conference 2021
Precision vs Context
ICD is completely different from Wikipedia
● catalog entries are precise descriptions without further context
● descriptions are not the most commonly used names
● descriptions tend to be very long: median number of words is 5, maximum is 28
● typically not used in this form by the doctors: low character overlap, low similarity
... RR 150/90...
... rezidiv. Bluthochdruck mit
Schwächegefühl...
Dr. Anja Pilz, ML Conference 2021
About Context..
Disambiguating information need not be
located the discharge letter
● can even be in a completely different
data format, e.g. lab measurements
● N18*: multiple measurements of a
specific lab value (Creatinine)
● not an NLP task anymore, time series
analysis?
https://www.dimdi.de/static/de/klassifikationen/icd/icd-10-gm
Dr. Anja Pilz, ML Conference 2021
Entity Linking in Practice
GoTo solution for candidate retrieval: inverted index over catalog descriptions
● basically a vector space model with cosine similarity over (query, entry)
● make use of the analyzers coming with lucene for tokenization, stemming, etc
Secret sauce
● add medical knowledge and extend the descriptions (e.g. synonyms)
● hand craft search query from the mention context
Gist: aim for high recall, you can’t link what you don’t find...
Pilz & Paaß, Collective Search for Concept
Disambiguation, COLING 2012
Dr. Anja Pilz, ML Conference 2021
Can handle typos and
spelling variations.
Query: “diabetes meltus”
fetches all codes for
Diabetes mellitus.
Demo
Dr. Anja Pilz, ML Conference 2021
Can handle alternative
names like synonyms or
acronyms.
Query “ANV 3” fetches all
“Akutes Nierenversagen ...
Stadium 3” codes
But which one is it? Can not
decide on the best
candidate...
Demo
Dr. Anja Pilz, ML Conference 2021
Best Candidate?
Recipe: rank by context similarity to decide on best candidate
● find expressive vector representations of mention-candidate pairs
○ word2vec
○ topic distributions (LDA)
○ graphical similarity …
● plug vectors into some ranking model
○ Ranking SVM
○ specific loss functions in Neural Networks (Hamming)
But we have seen: catalog does not provide extensive descriptions, so... Next time!
Pilz & Paaß, From names to entities using thematic
context distance, CIKM 2011
Dr. Anja Pilz, ML Conference 2021
Thanks!
Questions?
Say Hi!

More Related Content

What's hot

Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)Kuppusamy P
 
Unstructured text to structured data
Unstructured text to structured dataUnstructured text to structured data
Unstructured text to structured dataDan Sullivan, Ph.D.
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with PythonBenjamin Bengfort
 
Natural lanaguage processing
Natural lanaguage processingNatural lanaguage processing
Natural lanaguage processinggulshan kumar
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingVeenaSKumar2
 
Project on disease prediction
Project on disease predictionProject on disease prediction
Project on disease predictionKOYELMAJUMDAR1
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language ProcessingPranav Gupta
 
Data Mining in Healthcare: How Health Systems Can Improve Quality and Reduce...
Data Mining in Healthcare:  How Health Systems Can Improve Quality and Reduce...Data Mining in Healthcare:  How Health Systems Can Improve Quality and Reduce...
Data Mining in Healthcare: How Health Systems Can Improve Quality and Reduce...Health Catalyst
 
Machine Learning in Healthcare Diagnostics
Machine Learning in Healthcare DiagnosticsMachine Learning in Healthcare Diagnostics
Machine Learning in Healthcare DiagnosticsLarry Smarr
 
Recurrent neural networks rnn
Recurrent neural networks   rnnRecurrent neural networks   rnn
Recurrent neural networks rnnKuppusamy P
 
Machine Learning in Healthcare
Machine Learning in HealthcareMachine Learning in Healthcare
Machine Learning in HealthcareBigR.io
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Alia Hamwi
 
Word Embeddings, why the hype ?
Word Embeddings, why the hype ? Word Embeddings, why the hype ?
Word Embeddings, why the hype ? Hady Elsahar
 

What's hot (20)

Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
 
Unstructured text to structured data
Unstructured text to structured dataUnstructured text to structured data
Unstructured text to structured data
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with Python
 
Natural lanaguage processing
Natural lanaguage processingNatural lanaguage processing
Natural lanaguage processing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Project on disease prediction
Project on disease predictionProject on disease prediction
Project on disease prediction
 
AI in Healthcare.pptx
AI in Healthcare.pptxAI in Healthcare.pptx
AI in Healthcare.pptx
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Big Data
Big DataBig Data
Big Data
 
Data Mining in Healthcare: How Health Systems Can Improve Quality and Reduce...
Data Mining in Healthcare:  How Health Systems Can Improve Quality and Reduce...Data Mining in Healthcare:  How Health Systems Can Improve Quality and Reduce...
Data Mining in Healthcare: How Health Systems Can Improve Quality and Reduce...
 
Machine Learning in Healthcare Diagnostics
Machine Learning in Healthcare DiagnosticsMachine Learning in Healthcare Diagnostics
Machine Learning in Healthcare Diagnostics
 
Recurrent neural networks rnn
Recurrent neural networks   rnnRecurrent neural networks   rnn
Recurrent neural networks rnn
 
Machine Learning in Healthcare
Machine Learning in HealthcareMachine Learning in Healthcare
Machine Learning in Healthcare
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)
 
NLP
NLPNLP
NLP
 
Word Embeddings, why the hype ?
Word Embeddings, why the hype ? Word Embeddings, why the hype ?
Word Embeddings, why the hype ?
 
BERT
BERTBERT
BERT
 
sentiment analysis
sentiment analysis sentiment analysis
sentiment analysis
 
Disease Prediction by Machine Learning Over Big Data From Healthcare Communities
Disease Prediction by Machine Learning Over Big Data From Healthcare CommunitiesDisease Prediction by Machine Learning Over Big Data From Healthcare Communities
Disease Prediction by Machine Learning Over Big Data From Healthcare Communities
 
1.Introduction to deep learning
1.Introduction to deep learning1.Introduction to deep learning
1.Introduction to deep learning
 

Similar to Natural Language Processing for Medical Data

Nlp for the precision medicine
Nlp for the precision medicineNlp for the precision medicine
Nlp for the precision medicineVishwas N
 
Clinical Text processing with Python
Clinical Text processing with PythonClinical Text processing with Python
Clinical Text processing with PythonGaurav Trivedi
 
Understanding medical concepts and codes through NLP methods
Understanding medical concepts and codes through NLP methodsUnderstanding medical concepts and codes through NLP methods
Understanding medical concepts and codes through NLP methodsAshis Chanda
 
Preparing a manuscript
Preparing a manuscriptPreparing a manuscript
Preparing a manuscriptlemberger
 
Natural Language Processing to Curate Unstructured Electronic Health Records
Natural Language Processing to Curate Unstructured Electronic Health RecordsNatural Language Processing to Curate Unstructured Electronic Health Records
Natural Language Processing to Curate Unstructured Electronic Health RecordsMMS Holdings
 
Data Science in Clinical Care | Johannes Starlinger, Charité | DN18
Data Science in Clinical Care | Johannes Starlinger, Charité | DN18Data Science in Clinical Care | Johannes Starlinger, Charité | DN18
Data Science in Clinical Care | Johannes Starlinger, Charité | DN18DataconomyGmbH
 
PhD Defense - Knowledge graphs based extension of patients’ files to predict ...
PhD Defense - Knowledge graphs based extension of patients’ files to predict ...PhD Defense - Knowledge graphs based extension of patients’ files to predict ...
PhD Defense - Knowledge graphs based extension of patients’ files to predict ...Raphaël Gazzotti
 
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...Databricks
 
Preparing a manuscript
Preparing a manuscriptPreparing a manuscript
Preparing a manuscriptlemberger
 
Demystifying Text Analytics and NLP in Healthcare
Demystifying Text Analytics and NLP in HealthcareDemystifying Text Analytics and NLP in Healthcare
Demystifying Text Analytics and NLP in HealthcareHealth Catalyst
 
Experts decision making schemes 2018 tababa ytb 2 ss1
Experts decision making schemes 2018 tababa ytb 2   ss1Experts decision making schemes 2018 tababa ytb 2   ss1
Experts decision making schemes 2018 tababa ytb 2 ss1Imad Hassan
 
ShortTexet Lang Ophthalmology © 2000 Thieme.pdf
ShortTexet Lang Ophthalmology © 2000 Thieme.pdfShortTexet Lang Ophthalmology © 2000 Thieme.pdf
ShortTexet Lang Ophthalmology © 2000 Thieme.pdfMohammad Bawtag
 
OpenMRS Concept Management Tutorial
OpenMRS Concept Management TutorialOpenMRS Concept Management Tutorial
OpenMRS Concept Management Tutoriallnball
 
Paul f. jenkins making sense of the chest x-ray a hands-on guide (hodder arn...
Paul f. jenkins making sense of the chest x-ray  a hands-on guide (hodder arn...Paul f. jenkins making sense of the chest x-ray  a hands-on guide (hodder arn...
Paul f. jenkins making sense of the chest x-ray a hands-on guide (hodder arn...sarfaraz ahmed
 
ESWC2019 - Injecting domain knowledge in electronic medical records to improv...
ESWC2019 - Injecting domain knowledge in electronic medical records to improv...ESWC2019 - Injecting domain knowledge in electronic medical records to improv...
ESWC2019 - Injecting domain knowledge in electronic medical records to improv...Raphaël Gazzotti
 
Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017
Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017
Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017MLconf
 
Health Diagnostic Reasoning.pdf
Health Diagnostic Reasoning.pdfHealth Diagnostic Reasoning.pdf
Health Diagnostic Reasoning.pdfBrian712019
 
Medical advice as a Recommender System
Medical advice as a Recommender SystemMedical advice as a Recommender System
Medical advice as a Recommender SystemXavier Amatriain
 

Similar to Natural Language Processing for Medical Data (20)

Nlp for the precision medicine
Nlp for the precision medicineNlp for the precision medicine
Nlp for the precision medicine
 
Clinical Text processing with Python
Clinical Text processing with PythonClinical Text processing with Python
Clinical Text processing with Python
 
Understanding medical concepts and codes through NLP methods
Understanding medical concepts and codes through NLP methodsUnderstanding medical concepts and codes through NLP methods
Understanding medical concepts and codes through NLP methods
 
Preparing a manuscript
Preparing a manuscriptPreparing a manuscript
Preparing a manuscript
 
Natural Language Processing to Curate Unstructured Electronic Health Records
Natural Language Processing to Curate Unstructured Electronic Health RecordsNatural Language Processing to Curate Unstructured Electronic Health Records
Natural Language Processing to Curate Unstructured Electronic Health Records
 
Data Science in Clinical Care | Johannes Starlinger, Charité | DN18
Data Science in Clinical Care | Johannes Starlinger, Charité | DN18Data Science in Clinical Care | Johannes Starlinger, Charité | DN18
Data Science in Clinical Care | Johannes Starlinger, Charité | DN18
 
PhD Defense - Knowledge graphs based extension of patients’ files to predict ...
PhD Defense - Knowledge graphs based extension of patients’ files to predict ...PhD Defense - Knowledge graphs based extension of patients’ files to predict ...
PhD Defense - Knowledge graphs based extension of patients’ files to predict ...
 
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...
 
Preparing a manuscript
Preparing a manuscriptPreparing a manuscript
Preparing a manuscript
 
Demystifying Text Analytics and NLP in Healthcare
Demystifying Text Analytics and NLP in HealthcareDemystifying Text Analytics and NLP in Healthcare
Demystifying Text Analytics and NLP in Healthcare
 
CV C.Speck
CV C.SpeckCV C.Speck
CV C.Speck
 
Experts decision making schemes 2018 tababa ytb 2 ss1
Experts decision making schemes 2018 tababa ytb 2   ss1Experts decision making schemes 2018 tababa ytb 2   ss1
Experts decision making schemes 2018 tababa ytb 2 ss1
 
ShortTexet Lang Ophthalmology © 2000 Thieme.pdf
ShortTexet Lang Ophthalmology © 2000 Thieme.pdfShortTexet Lang Ophthalmology © 2000 Thieme.pdf
ShortTexet Lang Ophthalmology © 2000 Thieme.pdf
 
OpenMRS Concept Management Tutorial
OpenMRS Concept Management TutorialOpenMRS Concept Management Tutorial
OpenMRS Concept Management Tutorial
 
Paul f. jenkins making sense of the chest x-ray a hands-on guide (hodder arn...
Paul f. jenkins making sense of the chest x-ray  a hands-on guide (hodder arn...Paul f. jenkins making sense of the chest x-ray  a hands-on guide (hodder arn...
Paul f. jenkins making sense of the chest x-ray a hands-on guide (hodder arn...
 
ESWC2019 - Injecting domain knowledge in electronic medical records to improv...
ESWC2019 - Injecting domain knowledge in electronic medical records to improv...ESWC2019 - Injecting domain knowledge in electronic medical records to improv...
ESWC2019 - Injecting domain knowledge in electronic medical records to improv...
 
ML to cure the world
ML to cure the worldML to cure the world
ML to cure the world
 
Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017
Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017
Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017
 
Health Diagnostic Reasoning.pdf
Health Diagnostic Reasoning.pdfHealth Diagnostic Reasoning.pdf
Health Diagnostic Reasoning.pdf
 
Medical advice as a Recommender System
Medical advice as a Recommender SystemMedical advice as a Recommender System
Medical advice as a Recommender System
 

Recently uploaded

EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 

Recently uploaded (20)

EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 

Natural Language Processing for Medical Data

  • 2. Dr. Anja Pilz, ML Conference 2021 About me @anja_pilz aplz ● PhD in machine learning & natural language processing from University of Bonn & Fraunhofer IAIS ● Now in industry: AI and data driven products, since 2016 mostly in the medical and healthcare domain ● Main interests: NLP, especially German; information retrieval; recommender systems
  • 3. Dr. Anja Pilz, ML Conference 2021 Doctors spend more time documenting what they do than with effective treatment ● 70% of work hours dedicated to tasks not performed on the patient (orga & docs) Important as documentation covers symptoms, risk factors, intolerances, treatments, … ● each piece of information is vital for the patient - but can be buried somewhere Not only complex cases quickly become “unscannable” ● Use NLP for Information Extraction: automatically search, analyze, and add structure to these unstructured texts Swiss Medical Journal, 2016;97(1):6–8 Motivation
  • 4. Dr. Anja Pilz, ML Conference 2021 Support doctor’s daily work ● create warnings from automatically detected risks and contraindications ● summarize suspected and excluded diagnoses (differential diagnosis) ● add hints to treatment guidelines And much more! Motivation
  • 5. Dr. Anja Pilz, ML Conference 2021 Support billing process ● billing process is super complex and needs to be soundproof ● help medical controllers to find relevant information ● automatically find mentions of diseases and treatments ● align with entries from catalogs used for billing (e.g. ICD-10) Motivation Image damedic code
  • 6. Dr. Anja Pilz, ML Conference 2021 NLP Tasks for Medical Data filter relevant entities (clinical, billing) Entity Recognition (NER) Entity Linking (NEL/NED) Entity Filtering detect all relevant mentions: ● diagnoses ● procedures ● body parts ● drugs ● measurements ● negations... link to unique concepts: ● entries in (curated) medical ontologies or catalogs ● normalization used for documentation, summarization, & billing
  • 7. Dr. Anja Pilz, ML Conference 2021 Challenges: Medical Domain is not News Typical medical texts are very different common NLP data ● super condensed and short, sometimes like an enumeration ● full of abbreviations, acronyms and technical terms ● ambiguity is often resolved through sheer knowledge, not necessarily by the local context Indication: Acute hypoxia. Relapsed AML, GVHD, and renal failure with new hypoxia with clear chest x-ray.
  • 8. Dr. Anja Pilz, ML Conference 2021 Abbreviations are used for convenience ● ambiguous ones may cause miscommunication ● potentially jeopardise patient care Entity Linking needs to expand acronyms but must not rely on priors Challenges: Ambiguity TMZ temazepam temozolomide Holper et al., Ambiguous medical abbreviation study: challenges and opportunities, Intern Med J. 2020 LFT liver function test LFT lung function test HWI Harnwegsinfekt Hinterwandinfarkt BCa bladder cancer breast cancer VF Vorhofflimmern Vorhofflattern MS Magensonde Mitralstenose
  • 9. Dr. Anja Pilz, ML Conference 2021 Challenges: German Latin origin vs German spelling results in a bunch of variations ● Carcinom, Karcinom, Carzinom, Karzinom, Ca, CA The notorious compound words ● sensory sensation disorder: Schallempfindungsstörung ● occlusion of the central retinal artery: Netzhautarterienverschluss ● detection of Tuberculosis: Tuberkulosenachweis Decompounding is non-trivial and requires profound linguistic knowledge
  • 10. Dr. Anja Pilz, ML Conference 2021 ● data is available, e.g. BC5CDR (1500 PubMed articles with annotated chemicals, diseases & their interactions) ● trained models are available ● not “solved” but at a pretty good state of the art Entity Recognition (EN) https://scispacy.apps.allenai.org/
  • 11. Dr. Anja Pilz, ML Conference 2021 ● typical off-the-shelf models are not useful for the medical domain ● need to train domain models here Entity Recognition (DE)
  • 12. Dr. Anja Pilz, ML Conference 2021 Data? Real patient data ● resides in hospitals and medical practices ● not publicly available Public data ● netdoktor != Dr. B. Oss ● data in layman language does not compare well to real medical texts ● may still help Patient: “Ich habe im Moment keine Blutdruckprobleme” Doctor: “RR gut eingestellt”
  • 13. Dr. Anja Pilz, ML Conference 2021 Entity Recognition Get data. Start annotating. ● entities are all concepts of interest: drugs, medical conditions, procedures, body parts, … ● annotation usually requires medical expert knowledge ● super specific vocabulary with lots of abbreviations & acronyms ● good to go after ~1k documents
  • 14. Dr. Anja Pilz, ML Conference 2021 Train your own model Entity Recognition + data
  • 15. Dr. Anja Pilz, ML Conference 2021 Most work in research: link entity mentions to concepts in medical thesaurus UMLS ● higher level metadata enrichment ● index new publications by topic & keywords ● hot topic and a bunch of publications exists Why not? ● no German version (yet) ● concepts are sometimes not specific enough Entity Linking Murty et al., Hierarchical Losses and New Resources for Fine-grained Entity Typing and Linking, ACL 2018 Kolitsas et al., End-to-End Neural Entity Linking, CoNLL 2018 Mohan & Li, MedMentions: A Large Biomedical Corpus Annotated with UMLS Concepts, AKBC 2019
  • 16. Dr. Anja Pilz, ML Conference 2021 ICD-10 Linking ICD: International Statistical Classification of Diseases and Related Health Problems ● catalogs mental and physical disorders in most specific and precise form ● global standard for clinical documentation and billing ● published yearly by the WHO https://icd.who.int/browse10/2019/en
  • 17. Dr. Anja Pilz, ML Conference 2021 ICD-10 Linking ICD: International Statistical Classification of Diseases and Related Health Problems ● catalogs mental and physical disorders in most specific and precise form ● global standard for clinical documentation and billing ● published yearly by the WHO ● … comes with German modification ICD-10-GM (BfArM) https://www.dimdi.de/static/de/klassifikationen/icd/icd-10-gm
  • 18. Dr. Anja Pilz, ML Conference 2021 Higher clinical relevance ● support doctors: can’t get much more specific than with a diagnosis code ● support medical controllers: ICD codes are the items used in billing, not UMLS concepts Requires entity filtering to avoid false positives ● excluded or suspected diagnoses ● “state after diseases”: clinically but not be billing relevant ICD-10 Linking EHR Keine Hinweis auf intrazerebrale Blutung. Z.n. Hysterektomie, 2006
  • 19. Dr. Anja Pilz, ML Conference 2021 Most mentions may be clinically relevant, but not coding relevant. Need relation extraction approaches here.. Entity Filtering for primary coding Prostatacarcinom in der Vorgeschichte Vorbekannte Osteochondrose Z.n. mehrfachem Apoplexen, zuletzt 2006 Mamma-Ca wurde ausgeschlossen. Keine Hinweis auf intrazerebrale Blutung. Die BWK 9-Fraktur zeigte sich mit fehlender knöcherner Durchbauung im Sinne einer Pseudarthrose. Intrazerebrale Blutung konnte nicht bestätigt werden. Verdacht auf arterielle Hypertonie.
  • 20. Dr. Anja Pilz, ML Conference 2021 Toy example. Typical cases are much more complex.
  • 21. Dr. Anja Pilz, ML Conference 2021 To be really useful, the link must be super specific ● “some renal failure” (N17*) is not good enough Specificity relates to the stage of the disease ● hugely affects treatment complexity and care intensity ● treatment complexity directly corresponds to the hospital’s bill send to the insurance company ICD-10 Linking https://www.dimdi.de/static/de/klassifikationen/icd/icd-10-gm
  • 22. Dr. Anja Pilz, ML Conference 2021 Specificity To describe a disease in a certain stage or manifestation, the catalog is super specific ● 40 entries for different instances of Diabetes Mellitus, Type 1 and Type 2 each ● there are even more forms of Diabetes... Difference is sometimes only one word ● “nicht” or “mit/ohne”: usual stopwords are dangerous here! https://www.dimdi.de/static/de/klassifikationen/icd/icd-10-gm
  • 23. Dr. Anja Pilz, ML Conference 2021 Precision vs Context ICD is completely different from Wikipedia ● catalog entries are precise descriptions without further context ● descriptions are not the most commonly used names ● descriptions tend to be very long: median number of words is 5, maximum is 28 ● typically not used in this form by the doctors: low character overlap, low similarity ... RR 150/90... ... rezidiv. Bluthochdruck mit Schwächegefühl...
  • 24. Dr. Anja Pilz, ML Conference 2021 About Context.. Disambiguating information need not be located the discharge letter ● can even be in a completely different data format, e.g. lab measurements ● N18*: multiple measurements of a specific lab value (Creatinine) ● not an NLP task anymore, time series analysis? https://www.dimdi.de/static/de/klassifikationen/icd/icd-10-gm
  • 25. Dr. Anja Pilz, ML Conference 2021 Entity Linking in Practice GoTo solution for candidate retrieval: inverted index over catalog descriptions ● basically a vector space model with cosine similarity over (query, entry) ● make use of the analyzers coming with lucene for tokenization, stemming, etc Secret sauce ● add medical knowledge and extend the descriptions (e.g. synonyms) ● hand craft search query from the mention context Gist: aim for high recall, you can’t link what you don’t find... Pilz & Paaß, Collective Search for Concept Disambiguation, COLING 2012
  • 26. Dr. Anja Pilz, ML Conference 2021 Can handle typos and spelling variations. Query: “diabetes meltus” fetches all codes for Diabetes mellitus. Demo
  • 27. Dr. Anja Pilz, ML Conference 2021 Can handle alternative names like synonyms or acronyms. Query “ANV 3” fetches all “Akutes Nierenversagen ... Stadium 3” codes But which one is it? Can not decide on the best candidate... Demo
  • 28. Dr. Anja Pilz, ML Conference 2021 Best Candidate? Recipe: rank by context similarity to decide on best candidate ● find expressive vector representations of mention-candidate pairs ○ word2vec ○ topic distributions (LDA) ○ graphical similarity … ● plug vectors into some ranking model ○ Ranking SVM ○ specific loss functions in Neural Networks (Hamming) But we have seen: catalog does not provide extensive descriptions, so... Next time! Pilz & Paaß, From names to entities using thematic context distance, CIKM 2011
  • 29. Dr. Anja Pilz, ML Conference 2021 Thanks! Questions? Say Hi!