SlideShare a Scribd company logo
1 of 29
Natural
Language
Processing for
medical data
Dr. Anja Pilz, ML Conference 2021
About me
@anja_pilz
aplz
● PhD in machine learning & natural
language processing from University of
Bonn & Fraunhofer IAIS
● Now in industry: AI and data driven
products, since 2016 mostly in the medical
and healthcare domain
● Main interests: NLP, especially German;
information retrieval; recommender
systems
Dr. Anja Pilz, ML Conference 2021
Doctors spend more time documenting what they do than with effective treatment
● 70% of work hours dedicated to tasks not performed on the patient (orga & docs)
Important as documentation covers symptoms, risk factors, intolerances, treatments, …
● each piece of information is vital for the patient - but can be buried somewhere
Not only complex cases quickly become “unscannable”
● Use NLP for Information Extraction: automatically search, analyze, and add
structure to these unstructured texts
Swiss Medical Journal, 2016;97(1):6–8
Motivation
Dr. Anja Pilz, ML Conference 2021
Support doctor’s daily work
● create warnings from automatically detected
risks and contraindications
● summarize suspected and excluded
diagnoses (differential diagnosis)
● add hints to treatment guidelines
And much more!
Motivation
Dr. Anja Pilz, ML Conference 2021
Support billing process
● billing process is super complex and
needs to be soundproof
● help medical controllers to find
relevant information
● automatically find mentions of
diseases and treatments
● align with entries from catalogs
used for billing (e.g. ICD-10)
Motivation
Image damedic code
Dr. Anja Pilz, ML Conference 2021
NLP Tasks for Medical Data
filter relevant entities
(clinical, billing)
Entity
Recognition
(NER)
Entity Linking
(NEL/NED)
Entity Filtering
detect all relevant
mentions:
● diagnoses
● procedures
● body parts
● drugs
● measurements
● negations...
link to unique concepts:
● entries in (curated)
medical ontologies
or catalogs
● normalization used
for documentation,
summarization, &
billing
Dr. Anja Pilz, ML Conference 2021
Challenges: Medical Domain is not News
Typical medical texts are very different common NLP data
● super condensed and short, sometimes like an enumeration
● full of abbreviations, acronyms and technical terms
● ambiguity is often resolved through sheer knowledge, not necessarily by the local
context
Indication: Acute hypoxia. Relapsed AML,
GVHD, and renal failure with new hypoxia with
clear chest x-ray.
Dr. Anja Pilz, ML Conference 2021
Abbreviations are used for convenience
● ambiguous ones may cause miscommunication
● potentially jeopardise patient care
Entity Linking needs to expand acronyms but must not rely on priors
Challenges: Ambiguity
TMZ temazepam
temozolomide
Holper et al., Ambiguous medical abbreviation study:
challenges and opportunities, Intern Med J. 2020
LFT liver function test
LFT lung function test
HWI Harnwegsinfekt
Hinterwandinfarkt
BCa bladder cancer
breast cancer
VF Vorhofflimmern
Vorhofflattern
MS Magensonde
Mitralstenose
Dr. Anja Pilz, ML Conference 2021
Challenges: German
Latin origin vs German spelling results in a bunch of variations
● Carcinom, Karcinom, Carzinom, Karzinom, Ca, CA
The notorious compound words
● sensory sensation disorder: Schallempfindungsstörung
● occlusion of the central retinal artery: Netzhautarterienverschluss
● detection of Tuberculosis: Tuberkulosenachweis
Decompounding is non-trivial and requires profound linguistic knowledge
Dr. Anja Pilz, ML Conference 2021
● data is available, e.g. BC5CDR (1500 PubMed articles with annotated chemicals,
diseases & their interactions)
● trained models are available
● not “solved” but at a pretty good state of the art
Entity Recognition (EN)
https://scispacy.apps.allenai.org/
Dr. Anja Pilz, ML Conference 2021
● typical off-the-shelf models are not useful for the medical domain
● need to train domain models here
Entity Recognition (DE)
Dr. Anja Pilz, ML Conference 2021
Data?
Real patient data
● resides in hospitals and medical practices
● not publicly available
Public data
● netdoktor != Dr. B. Oss
● data in layman language does not compare well to real medical texts
● may still help
Patient: “Ich habe im
Moment keine
Blutdruckprobleme”
Doctor: “RR gut eingestellt”
Dr. Anja Pilz, ML Conference 2021
Entity Recognition
Get data. Start annotating.
● entities are all concepts of interest:
drugs, medical conditions, procedures,
body parts, …
● annotation usually requires medical
expert knowledge
● super specific vocabulary with lots of
abbreviations & acronyms
● good to go after ~1k documents
Dr. Anja Pilz, ML Conference 2021
Train your own model
Entity Recognition
+ data
Dr. Anja Pilz, ML Conference 2021
Most work in research: link entity mentions to concepts
in medical thesaurus UMLS
● higher level metadata enrichment
● index new publications by topic & keywords
● hot topic and a bunch of publications exists
Why not?
● no German version (yet)
● concepts are sometimes not specific enough
Entity Linking
Murty et al., Hierarchical Losses and New Resources for
Fine-grained Entity Typing and Linking, ACL 2018
Kolitsas et al., End-to-End Neural Entity Linking, CoNLL 2018
Mohan & Li, MedMentions: A Large Biomedical Corpus
Annotated with UMLS Concepts, AKBC 2019
Dr. Anja Pilz, ML Conference 2021
ICD-10 Linking
ICD: International Statistical Classification of
Diseases and Related Health Problems
● catalogs mental and physical disorders in
most specific and precise form
● global standard for clinical
documentation and billing
● published yearly by the WHO
https://icd.who.int/browse10/2019/en
Dr. Anja Pilz, ML Conference 2021
ICD-10 Linking
ICD: International Statistical Classification of
Diseases and Related Health Problems
● catalogs mental and physical disorders
in most specific and precise form
● global standard for clinical
documentation and billing
● published yearly by the WHO
● … comes with German modification
ICD-10-GM (BfArM)
https://www.dimdi.de/static/de/klassifikationen/icd/icd-10-gm
Dr. Anja Pilz, ML Conference 2021
Higher clinical relevance
● support doctors: can’t get much more specific than with a
diagnosis code
● support medical controllers: ICD codes are the items used in
billing, not UMLS concepts
Requires entity filtering to avoid false positives
● excluded or suspected diagnoses
● “state after diseases”: clinically but not be billing relevant
ICD-10 Linking
EHR
Keine Hinweis auf
intrazerebrale
Blutung.
Z.n. Hysterektomie,
2006
Dr. Anja Pilz, ML Conference 2021
Most mentions may be clinically relevant, but not coding relevant.
Need relation extraction approaches here..
Entity Filtering for primary coding
Prostatacarcinom in der Vorgeschichte
Vorbekannte Osteochondrose
Z.n. mehrfachem Apoplexen, zuletzt 2006
Mamma-Ca wurde ausgeschlossen.
Keine Hinweis auf intrazerebrale
Blutung.
Die BWK 9-Fraktur zeigte sich mit
fehlender knöcherner Durchbauung im
Sinne einer Pseudarthrose.
Intrazerebrale Blutung konnte
nicht bestätigt werden.
Verdacht auf arterielle Hypertonie.
Dr. Anja Pilz, ML Conference 2021
Toy example. Typical
cases are much
more complex.
Dr. Anja Pilz, ML Conference 2021
To be really useful, the link must be super specific
● “some renal failure” (N17*) is not good enough
Specificity relates to the stage of the disease
● hugely affects treatment complexity and care
intensity
● treatment complexity directly corresponds to
the hospital’s bill send to the insurance
company
ICD-10 Linking
https://www.dimdi.de/static/de/klassifikationen/icd/icd-10-gm
Dr. Anja Pilz, ML Conference 2021
Specificity
To describe a disease in a certain stage or
manifestation, the catalog is super specific
● 40 entries for different instances of
Diabetes Mellitus, Type 1 and Type 2 each
● there are even more forms of Diabetes...
Difference is sometimes only one word
● “nicht” or “mit/ohne”: usual stopwords are
dangerous here!
https://www.dimdi.de/static/de/klassifikationen/icd/icd-10-gm
Dr. Anja Pilz, ML Conference 2021
Precision vs Context
ICD is completely different from Wikipedia
● catalog entries are precise descriptions without further context
● descriptions are not the most commonly used names
● descriptions tend to be very long: median number of words is 5, maximum is 28
● typically not used in this form by the doctors: low character overlap, low similarity
... RR 150/90...
... rezidiv. Bluthochdruck mit
Schwächegefühl...
Dr. Anja Pilz, ML Conference 2021
About Context..
Disambiguating information need not be
located the discharge letter
● can even be in a completely different
data format, e.g. lab measurements
● N18*: multiple measurements of a
specific lab value (Creatinine)
● not an NLP task anymore, time series
analysis?
https://www.dimdi.de/static/de/klassifikationen/icd/icd-10-gm
Dr. Anja Pilz, ML Conference 2021
Entity Linking in Practice
GoTo solution for candidate retrieval: inverted index over catalog descriptions
● basically a vector space model with cosine similarity over (query, entry)
● make use of the analyzers coming with lucene for tokenization, stemming, etc
Secret sauce
● add medical knowledge and extend the descriptions (e.g. synonyms)
● hand craft search query from the mention context
Gist: aim for high recall, you can’t link what you don’t find...
Pilz & Paaß, Collective Search for Concept
Disambiguation, COLING 2012
Dr. Anja Pilz, ML Conference 2021
Can handle typos and
spelling variations.
Query: “diabetes meltus”
fetches all codes for
Diabetes mellitus.
Demo
Dr. Anja Pilz, ML Conference 2021
Can handle alternative
names like synonyms or
acronyms.
Query “ANV 3” fetches all
“Akutes Nierenversagen ...
Stadium 3” codes
But which one is it? Can not
decide on the best
candidate...
Demo
Dr. Anja Pilz, ML Conference 2021
Best Candidate?
Recipe: rank by context similarity to decide on best candidate
● find expressive vector representations of mention-candidate pairs
○ word2vec
○ topic distributions (LDA)
○ graphical similarity …
● plug vectors into some ranking model
○ Ranking SVM
○ specific loss functions in Neural Networks (Hamming)
But we have seen: catalog does not provide extensive descriptions, so... Next time!
Pilz & Paaß, From names to entities using thematic
context distance, CIKM 2011
Dr. Anja Pilz, ML Conference 2021
Thanks!
Questions?
Say Hi!

More Related Content

What's hot

Federated Learning
Federated LearningFederated Learning
Federated Learning
DataWorks Summit
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language Processing
Yunyao Li
 

What's hot (20)

Biomedical Entity Linking - Introduction, approaches, challenges
Biomedical Entity Linking - Introduction, approaches, challengesBiomedical Entity Linking - Introduction, approaches, challenges
Biomedical Entity Linking - Introduction, approaches, challenges
 
Ai in healthcare (3)
Ai in healthcare (3)Ai in healthcare (3)
Ai in healthcare (3)
 
Generative AI in Healthcare Market.pptx
Generative AI in Healthcare Market.pptxGenerative AI in Healthcare Market.pptx
Generative AI in Healthcare Market.pptx
 
Machine Learning in Healthcare and Life Science
Machine Learning in Healthcare and Life ScienceMachine Learning in Healthcare and Life Science
Machine Learning in Healthcare and Life Science
 
Artificial intelligence in health care
Artificial intelligence in health careArtificial intelligence in health care
Artificial intelligence in health care
 
Project on disease prediction
Project on disease predictionProject on disease prediction
Project on disease prediction
 
Using Generative AI
Using Generative AIUsing Generative AI
Using Generative AI
 
Federated Learning
Federated LearningFederated Learning
Federated Learning
 
5 Important Artificial Intelligence Predictions (For 2019) Everyone Should Read
5 Important Artificial Intelligence Predictions (For 2019) Everyone Should Read5 Important Artificial Intelligence Predictions (For 2019) Everyone Should Read
5 Important Artificial Intelligence Predictions (For 2019) Everyone Should Read
 
Machine Learning for Disease Prediction
Machine Learning for Disease PredictionMachine Learning for Disease Prediction
Machine Learning for Disease Prediction
 
AI and Healthcare 2023.pdf
AI and Healthcare 2023.pdfAI and Healthcare 2023.pdf
AI and Healthcare 2023.pdf
 
Disease Prediction And Doctor Appointment system
Disease Prediction And Doctor Appointment  systemDisease Prediction And Doctor Appointment  system
Disease Prediction And Doctor Appointment system
 
Big Data Analytics for Healthcare
Big Data Analytics for HealthcareBig Data Analytics for Healthcare
Big Data Analytics for Healthcare
 
Big Data Analytics for Smart Health Care
Big Data Analytics for Smart Health CareBig Data Analytics for Smart Health Care
Big Data Analytics for Smart Health Care
 
Application of ai in healthcare
Application of ai in healthcareApplication of ai in healthcare
Application of ai in healthcare
 
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
 
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language Processing
 
Top 10 Strategic Technologies in 2024: AI and Automation
Top 10 Strategic Technologies in 2024: AI and AutomationTop 10 Strategic Technologies in 2024: AI and Automation
Top 10 Strategic Technologies in 2024: AI and Automation
 
Large Language Models Bootcamp
Large Language Models BootcampLarge Language Models Bootcamp
Large Language Models Bootcamp
 

Similar to Natural Language Processing for Medical Data

PhD Defense - Knowledge graphs based extension of patients’ files to predict ...
PhD Defense - Knowledge graphs based extension of patients’ files to predict ...PhD Defense - Knowledge graphs based extension of patients’ files to predict ...
PhD Defense - Knowledge graphs based extension of patients’ files to predict ...
Raphaël Gazzotti
 
Paul f. jenkins making sense of the chest x-ray a hands-on guide (hodder arn...
Paul f. jenkins making sense of the chest x-ray  a hands-on guide (hodder arn...Paul f. jenkins making sense of the chest x-ray  a hands-on guide (hodder arn...
Paul f. jenkins making sense of the chest x-ray a hands-on guide (hodder arn...
sarfaraz ahmed
 
Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017
Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017
Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017
MLconf
 

Similar to Natural Language Processing for Medical Data (20)

Nlp for the precision medicine
Nlp for the precision medicineNlp for the precision medicine
Nlp for the precision medicine
 
Clinical Text processing with Python
Clinical Text processing with PythonClinical Text processing with Python
Clinical Text processing with Python
 
Understanding medical concepts and codes through NLP methods
Understanding medical concepts and codes through NLP methodsUnderstanding medical concepts and codes through NLP methods
Understanding medical concepts and codes through NLP methods
 
Preparing a manuscript
Preparing a manuscriptPreparing a manuscript
Preparing a manuscript
 
Natural Language Processing to Curate Unstructured Electronic Health Records
Natural Language Processing to Curate Unstructured Electronic Health RecordsNatural Language Processing to Curate Unstructured Electronic Health Records
Natural Language Processing to Curate Unstructured Electronic Health Records
 
Data Science in Clinical Care | Johannes Starlinger, Charité | DN18
Data Science in Clinical Care | Johannes Starlinger, Charité | DN18Data Science in Clinical Care | Johannes Starlinger, Charité | DN18
Data Science in Clinical Care | Johannes Starlinger, Charité | DN18
 
PhD Defense - Knowledge graphs based extension of patients’ files to predict ...
PhD Defense - Knowledge graphs based extension of patients’ files to predict ...PhD Defense - Knowledge graphs based extension of patients’ files to predict ...
PhD Defense - Knowledge graphs based extension of patients’ files to predict ...
 
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...
 
Preparing a manuscript
Preparing a manuscriptPreparing a manuscript
Preparing a manuscript
 
Demystifying Text Analytics and NLP in Healthcare
Demystifying Text Analytics and NLP in HealthcareDemystifying Text Analytics and NLP in Healthcare
Demystifying Text Analytics and NLP in Healthcare
 
CV C.Speck
CV C.SpeckCV C.Speck
CV C.Speck
 
Experts decision making schemes 2018 tababa ytb 2 ss1
Experts decision making schemes 2018 tababa ytb 2   ss1Experts decision making schemes 2018 tababa ytb 2   ss1
Experts decision making schemes 2018 tababa ytb 2 ss1
 
ShortTexet Lang Ophthalmology © 2000 Thieme.pdf
ShortTexet Lang Ophthalmology © 2000 Thieme.pdfShortTexet Lang Ophthalmology © 2000 Thieme.pdf
ShortTexet Lang Ophthalmology © 2000 Thieme.pdf
 
OpenMRS Concept Management Tutorial
OpenMRS Concept Management TutorialOpenMRS Concept Management Tutorial
OpenMRS Concept Management Tutorial
 
Paul f. jenkins making sense of the chest x-ray a hands-on guide (hodder arn...
Paul f. jenkins making sense of the chest x-ray  a hands-on guide (hodder arn...Paul f. jenkins making sense of the chest x-ray  a hands-on guide (hodder arn...
Paul f. jenkins making sense of the chest x-ray a hands-on guide (hodder arn...
 
ESWC2019 - Injecting domain knowledge in electronic medical records to improv...
ESWC2019 - Injecting domain knowledge in electronic medical records to improv...ESWC2019 - Injecting domain knowledge in electronic medical records to improv...
ESWC2019 - Injecting domain knowledge in electronic medical records to improv...
 
ML to cure the world
ML to cure the worldML to cure the world
ML to cure the world
 
Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017
Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017
Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017
 
Health Diagnostic Reasoning.pdf
Health Diagnostic Reasoning.pdfHealth Diagnostic Reasoning.pdf
Health Diagnostic Reasoning.pdf
 
Medical advice as a Recommender System
Medical advice as a Recommender SystemMedical advice as a Recommender System
Medical advice as a Recommender System
 

Recently uploaded

Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertainty
RafigAliyev2
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
cyebo
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
pyhepag
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
pyhepag
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
pyhepag
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
DilipVasan
 

Recently uploaded (20)

Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertainty
 
社内勉強会資料  Mamba - A new era or ephemeral
社内勉強会資料   Mamba - A new era or ephemeral社内勉強会資料   Mamba - A new era or ephemeral
社内勉強会資料  Mamba - A new era or ephemeral
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdf
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp online
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptx
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdf
 
Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp ClaimsSlip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
 
Machine Learning for Accident Severity Prediction
Machine Learning for Accident Severity PredictionMachine Learning for Accident Severity Prediction
Machine Learning for Accident Severity Prediction
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
 
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdfGenerative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdf
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
 

Natural Language Processing for Medical Data

  • 2. Dr. Anja Pilz, ML Conference 2021 About me @anja_pilz aplz ● PhD in machine learning & natural language processing from University of Bonn & Fraunhofer IAIS ● Now in industry: AI and data driven products, since 2016 mostly in the medical and healthcare domain ● Main interests: NLP, especially German; information retrieval; recommender systems
  • 3. Dr. Anja Pilz, ML Conference 2021 Doctors spend more time documenting what they do than with effective treatment ● 70% of work hours dedicated to tasks not performed on the patient (orga & docs) Important as documentation covers symptoms, risk factors, intolerances, treatments, … ● each piece of information is vital for the patient - but can be buried somewhere Not only complex cases quickly become “unscannable” ● Use NLP for Information Extraction: automatically search, analyze, and add structure to these unstructured texts Swiss Medical Journal, 2016;97(1):6–8 Motivation
  • 4. Dr. Anja Pilz, ML Conference 2021 Support doctor’s daily work ● create warnings from automatically detected risks and contraindications ● summarize suspected and excluded diagnoses (differential diagnosis) ● add hints to treatment guidelines And much more! Motivation
  • 5. Dr. Anja Pilz, ML Conference 2021 Support billing process ● billing process is super complex and needs to be soundproof ● help medical controllers to find relevant information ● automatically find mentions of diseases and treatments ● align with entries from catalogs used for billing (e.g. ICD-10) Motivation Image damedic code
  • 6. Dr. Anja Pilz, ML Conference 2021 NLP Tasks for Medical Data filter relevant entities (clinical, billing) Entity Recognition (NER) Entity Linking (NEL/NED) Entity Filtering detect all relevant mentions: ● diagnoses ● procedures ● body parts ● drugs ● measurements ● negations... link to unique concepts: ● entries in (curated) medical ontologies or catalogs ● normalization used for documentation, summarization, & billing
  • 7. Dr. Anja Pilz, ML Conference 2021 Challenges: Medical Domain is not News Typical medical texts are very different common NLP data ● super condensed and short, sometimes like an enumeration ● full of abbreviations, acronyms and technical terms ● ambiguity is often resolved through sheer knowledge, not necessarily by the local context Indication: Acute hypoxia. Relapsed AML, GVHD, and renal failure with new hypoxia with clear chest x-ray.
  • 8. Dr. Anja Pilz, ML Conference 2021 Abbreviations are used for convenience ● ambiguous ones may cause miscommunication ● potentially jeopardise patient care Entity Linking needs to expand acronyms but must not rely on priors Challenges: Ambiguity TMZ temazepam temozolomide Holper et al., Ambiguous medical abbreviation study: challenges and opportunities, Intern Med J. 2020 LFT liver function test LFT lung function test HWI Harnwegsinfekt Hinterwandinfarkt BCa bladder cancer breast cancer VF Vorhofflimmern Vorhofflattern MS Magensonde Mitralstenose
  • 9. Dr. Anja Pilz, ML Conference 2021 Challenges: German Latin origin vs German spelling results in a bunch of variations ● Carcinom, Karcinom, Carzinom, Karzinom, Ca, CA The notorious compound words ● sensory sensation disorder: Schallempfindungsstörung ● occlusion of the central retinal artery: Netzhautarterienverschluss ● detection of Tuberculosis: Tuberkulosenachweis Decompounding is non-trivial and requires profound linguistic knowledge
  • 10. Dr. Anja Pilz, ML Conference 2021 ● data is available, e.g. BC5CDR (1500 PubMed articles with annotated chemicals, diseases & their interactions) ● trained models are available ● not “solved” but at a pretty good state of the art Entity Recognition (EN) https://scispacy.apps.allenai.org/
  • 11. Dr. Anja Pilz, ML Conference 2021 ● typical off-the-shelf models are not useful for the medical domain ● need to train domain models here Entity Recognition (DE)
  • 12. Dr. Anja Pilz, ML Conference 2021 Data? Real patient data ● resides in hospitals and medical practices ● not publicly available Public data ● netdoktor != Dr. B. Oss ● data in layman language does not compare well to real medical texts ● may still help Patient: “Ich habe im Moment keine Blutdruckprobleme” Doctor: “RR gut eingestellt”
  • 13. Dr. Anja Pilz, ML Conference 2021 Entity Recognition Get data. Start annotating. ● entities are all concepts of interest: drugs, medical conditions, procedures, body parts, … ● annotation usually requires medical expert knowledge ● super specific vocabulary with lots of abbreviations & acronyms ● good to go after ~1k documents
  • 14. Dr. Anja Pilz, ML Conference 2021 Train your own model Entity Recognition + data
  • 15. Dr. Anja Pilz, ML Conference 2021 Most work in research: link entity mentions to concepts in medical thesaurus UMLS ● higher level metadata enrichment ● index new publications by topic & keywords ● hot topic and a bunch of publications exists Why not? ● no German version (yet) ● concepts are sometimes not specific enough Entity Linking Murty et al., Hierarchical Losses and New Resources for Fine-grained Entity Typing and Linking, ACL 2018 Kolitsas et al., End-to-End Neural Entity Linking, CoNLL 2018 Mohan & Li, MedMentions: A Large Biomedical Corpus Annotated with UMLS Concepts, AKBC 2019
  • 16. Dr. Anja Pilz, ML Conference 2021 ICD-10 Linking ICD: International Statistical Classification of Diseases and Related Health Problems ● catalogs mental and physical disorders in most specific and precise form ● global standard for clinical documentation and billing ● published yearly by the WHO https://icd.who.int/browse10/2019/en
  • 17. Dr. Anja Pilz, ML Conference 2021 ICD-10 Linking ICD: International Statistical Classification of Diseases and Related Health Problems ● catalogs mental and physical disorders in most specific and precise form ● global standard for clinical documentation and billing ● published yearly by the WHO ● … comes with German modification ICD-10-GM (BfArM) https://www.dimdi.de/static/de/klassifikationen/icd/icd-10-gm
  • 18. Dr. Anja Pilz, ML Conference 2021 Higher clinical relevance ● support doctors: can’t get much more specific than with a diagnosis code ● support medical controllers: ICD codes are the items used in billing, not UMLS concepts Requires entity filtering to avoid false positives ● excluded or suspected diagnoses ● “state after diseases”: clinically but not be billing relevant ICD-10 Linking EHR Keine Hinweis auf intrazerebrale Blutung. Z.n. Hysterektomie, 2006
  • 19. Dr. Anja Pilz, ML Conference 2021 Most mentions may be clinically relevant, but not coding relevant. Need relation extraction approaches here.. Entity Filtering for primary coding Prostatacarcinom in der Vorgeschichte Vorbekannte Osteochondrose Z.n. mehrfachem Apoplexen, zuletzt 2006 Mamma-Ca wurde ausgeschlossen. Keine Hinweis auf intrazerebrale Blutung. Die BWK 9-Fraktur zeigte sich mit fehlender knöcherner Durchbauung im Sinne einer Pseudarthrose. Intrazerebrale Blutung konnte nicht bestätigt werden. Verdacht auf arterielle Hypertonie.
  • 20. Dr. Anja Pilz, ML Conference 2021 Toy example. Typical cases are much more complex.
  • 21. Dr. Anja Pilz, ML Conference 2021 To be really useful, the link must be super specific ● “some renal failure” (N17*) is not good enough Specificity relates to the stage of the disease ● hugely affects treatment complexity and care intensity ● treatment complexity directly corresponds to the hospital’s bill send to the insurance company ICD-10 Linking https://www.dimdi.de/static/de/klassifikationen/icd/icd-10-gm
  • 22. Dr. Anja Pilz, ML Conference 2021 Specificity To describe a disease in a certain stage or manifestation, the catalog is super specific ● 40 entries for different instances of Diabetes Mellitus, Type 1 and Type 2 each ● there are even more forms of Diabetes... Difference is sometimes only one word ● “nicht” or “mit/ohne”: usual stopwords are dangerous here! https://www.dimdi.de/static/de/klassifikationen/icd/icd-10-gm
  • 23. Dr. Anja Pilz, ML Conference 2021 Precision vs Context ICD is completely different from Wikipedia ● catalog entries are precise descriptions without further context ● descriptions are not the most commonly used names ● descriptions tend to be very long: median number of words is 5, maximum is 28 ● typically not used in this form by the doctors: low character overlap, low similarity ... RR 150/90... ... rezidiv. Bluthochdruck mit Schwächegefühl...
  • 24. Dr. Anja Pilz, ML Conference 2021 About Context.. Disambiguating information need not be located the discharge letter ● can even be in a completely different data format, e.g. lab measurements ● N18*: multiple measurements of a specific lab value (Creatinine) ● not an NLP task anymore, time series analysis? https://www.dimdi.de/static/de/klassifikationen/icd/icd-10-gm
  • 25. Dr. Anja Pilz, ML Conference 2021 Entity Linking in Practice GoTo solution for candidate retrieval: inverted index over catalog descriptions ● basically a vector space model with cosine similarity over (query, entry) ● make use of the analyzers coming with lucene for tokenization, stemming, etc Secret sauce ● add medical knowledge and extend the descriptions (e.g. synonyms) ● hand craft search query from the mention context Gist: aim for high recall, you can’t link what you don’t find... Pilz & Paaß, Collective Search for Concept Disambiguation, COLING 2012
  • 26. Dr. Anja Pilz, ML Conference 2021 Can handle typos and spelling variations. Query: “diabetes meltus” fetches all codes for Diabetes mellitus. Demo
  • 27. Dr. Anja Pilz, ML Conference 2021 Can handle alternative names like synonyms or acronyms. Query “ANV 3” fetches all “Akutes Nierenversagen ... Stadium 3” codes But which one is it? Can not decide on the best candidate... Demo
  • 28. Dr. Anja Pilz, ML Conference 2021 Best Candidate? Recipe: rank by context similarity to decide on best candidate ● find expressive vector representations of mention-candidate pairs ○ word2vec ○ topic distributions (LDA) ○ graphical similarity … ● plug vectors into some ranking model ○ Ranking SVM ○ specific loss functions in Neural Networks (Hamming) But we have seen: catalog does not provide extensive descriptions, so... Next time! Pilz & Paaß, From names to entities using thematic context distance, CIKM 2011
  • 29. Dr. Anja Pilz, ML Conference 2021 Thanks! Questions? Say Hi!