SlideShare a Scribd company logo
1 of 16
Biomedical
Entity Linking
Introduction,
approaches, challenges
About me
● PhD in machine learning & natural language processing
from University of Bonn & Fraunhofer IAIS
● Now in industry: AI and data driven products, since 2016
mostly in the medical and healthcare domain
● Main interests: NLP, especially German; information
retrieval; recommender systems
@anja_pilz
aplz
Outline
● Motivation: why do we need entity linking
○ Ambiguity, use cases
● Entity linking in the biomedical domain
○ Data and ontologies, main challenges
● Technical problem: 3 stage task
○ Approaches for each of these stages (sketches & references)
● Short preview of challenges with German data
Language is ambiguous
“with steroid induced diabetes, I lost a stone
in three days, it was grim”
Type II diabetes
Type I diabetes
Gestational
diabetes
Steroid diabetes
gallstones
kidney stones
the stone
(unit)
a random
stone
grim protein, Drosophila
variants of
steroids
Why do we care?
Need to resolve ambiguity to
● avoid mistakes in patient-doctor communication
○ specialist vs layman vocabulary
● automatically retrieve important information
○ side effects of drugs discussed in online patient fora
● enrich electronic health records (EHR)
○ links to newest research, treatment guidelines or other LOD resources
And many more reasons…
Entity linking resolves ambiguity by assigning each mention its underlying “sense”.
Headache
Cephalgia
Entities: entries in (curated), medical ontologiesMentions: textual references of medical terms like
diagnoses, treatments, body parts, drugs, ...
Biomedical Entity Linking
Migraine
Head Pain
Cranial Pain
Headache
(D006261)
layman
terms
EHR
specialist
vocabulary
Example: excerpt from a PubMed abstract linked to UMLS (Unified Medical Language
System)
Biomedical Entity Linking
Mohan & Li, MedMentions: A Large Biomedical Corpus
Annotated with UMLS Concepts, AKBC 2019
The technique does not
require contrast material, so
it can safely be used in
patients with renal failure.
Why is that hard?
● Notion of uniqueness: a disease is
rendered unique by the person it affects
(and the stage)
● Uniqueness heavily affects linkability:
which stage of renal failure is meant?
○ candidates “look” super similar
○ might even need additional resources (lab)
Acute renal failure: Her baseline Cr is
1.8. On presentation the Cr had
increased to 7.7 secondary to the
bilateral hydronephrosis.
https://icd.who.int/browse11/l-m/en
Johnson et al., MIMIC-III, a freely accessible
critical care database. Scientific Data 2016
Given some text document, find all spans of words m that mention some entity e and
assign each span to a unique identifier (entry in a KB).
Technical Problem
Entity Recognition: detect spans to be linked
(Sequence Tagging)
Candidate Retrieval: find all relevant candidates in a KB
(Information Retrieval)
Candidate Ranking: decide on the best candidate
(Ranking Task)
Errorpropagation
Step 1: Entity Recognition
Goal: detect diagnoses, measurements, procedures in the text of the EHR
● supervised: train a sequence tagging model
○ pick a model: lots of literature but mostly sth Bi-LSTM CRF
○ (manually) annotate data
● pro: domain adaptation & custom features
● con: requires training data & medical expertise
Roller et al., Detecting Named Entities and Relations
in German Clinical Reports, GSCL 2017
Murty et al., Hierarchical Losses and New
Resources for Fine-grained Entity Typing and
Linking, ACL 2018
Lampe et al., Neural Architectures for Named Entity
Recognition. NAACL-HLT 2016
Indication: Acute hypoxia. Relapsed AML,
GVHD, and renal failure with new hypoxia with
clear chest x-ray.
Step 1: Entity Recognition
Goal: detect diagnoses, measurements, procedures in the text of the EHR
● weakly labeled: keyword matching
○ walk over text and lookup every span in a dictionary
○ keep all spans that have at least one entity candidate
● pro: no need to annotate data
● con: noise, type and recall issues
Murty et al., Hierarchical Losses and New
Resources for Fine-grained Entity Typing and
Linking, ACL 2018
Kolitsas et al., End-to-End Neural Entity Linking,
CoNLL 2018
Wiatrak, Iso-Sipilä. Simple Hierarchical Multi-Task
Neural End-To-End Entity Linking for Biomedical
Text. LOUHI@EMNLP 2020
Indication: Acute hypoxia. Relapsed AML,
GVHD, and renal failure with new hypoxia with
clear chest x-ray.
Step 2: Candidate Retrieval
Goal: fetch all relevant candidate entities from the ontology
● upper bound on performance: you can’t link what you
don’t find
GoTo solution: inverted index (lucene) over entity descriptions
● make use of the analyzers coming with lucene for
tokenization, stemming, etc
● craft search query from the mention context
● keep top 5, 10, 100 hits as candidates
Pilz & Paaß, Collective Search for Concept
Disambiguation, COLING 2012
Step 3: Candidate Ranking
Goal: decide on the best candidate as target entity
Rank by context similarity
● compare text representations of mention context and
entity description (word2vec, topic distributions, etc)
● but: medical ontologies do often not provide extensive
descriptions
Pilz & Paaß, From names to entities using thematic
context distance, CIKM 2011
Step 3: Candidate Ranking
Goal: decide on the best candidate as target entity
Add type similarity from hierarchies
● Wikipedia: categories assigned to entities
● UMLS: use semantic types
○ distinguish disease form the gene its caused by
○ LATTE: find boost in linking performance when adding type
encoding learned from UMLS types
Zhu et al., LATTE: Latent Type Modeling for
Biomedical Entity Linking, AAAI 2020
UMLS® Reference Manual
Step 3: Candidate Ranking
Goal: decide on the best candidate as target entity
In a nutshell
● find expressive vector representations of mention-candidate pairs
● plug vectors into some function to rank them
○ Ranking SVM, specific loss functions in NN, …
● the information in the vector is more important than the algorithm!
Challenges with German data
● Data is scarce, nothing comparable to MIMIC-III or MedMentions exists
● Ontologies like UMLS are only available in English
● NLP for German is a tad harder
○ Common nouns look like named entities (upper case)
● … the notorious compound words
○ sensory sensation disorder: Schallempfindungsstörung
○ occlusion of the central retinal artery: Netzhautarterienverschluss
Ideas?
Let’s discuss!

More Related Content

What's hot

Topic Modeling - NLP
Topic Modeling - NLPTopic Modeling - NLP
Topic Modeling - NLPRupak Roy
 
How to fine-tune and develop your own large language model.pptx
How to fine-tune and develop your own large language model.pptxHow to fine-tune and develop your own large language model.pptx
How to fine-tune and develop your own large language model.pptxKnoldus Inc.
 
Pre trained language model
Pre trained language modelPre trained language model
Pre trained language modelJiWenKim
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language ProcessingYunyao Li
 
Data Quality for Machine Learning Tasks
Data Quality for Machine Learning TasksData Quality for Machine Learning Tasks
Data Quality for Machine Learning TasksHima Patel
 
Information Retrieval using Semantic Similarity
Information Retrieval using Semantic SimilarityInformation Retrieval using Semantic Similarity
Information Retrieval using Semantic SimilaritySaswat Padhi
 
Neural Architectures for Named Entity Recognition
Neural Architectures for Named Entity RecognitionNeural Architectures for Named Entity Recognition
Neural Architectures for Named Entity RecognitionRrubaa Panchendrarajan
 
Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...NILESH VERMA
 
Introduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaIntroduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaAlexey Grigorev
 
Label propagation - Semisupervised Learning with Applications to NLP
Label propagation - Semisupervised Learning with Applications to NLPLabel propagation - Semisupervised Learning with Applications to NLP
Label propagation - Semisupervised Learning with Applications to NLPDavid Przybilla
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understandinggohyunwoong
 
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...David Talby
 
Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...
Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...
Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...Sri Ambati
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language ProcessingYunyao Li
 
Explainable AI (XAI) - A Perspective
Explainable AI (XAI) - A Perspective Explainable AI (XAI) - A Perspective
Explainable AI (XAI) - A Perspective Saurabh Kaushik
 

What's hot (20)

Topic Modeling - NLP
Topic Modeling - NLPTopic Modeling - NLP
Topic Modeling - NLP
 
How to fine-tune and develop your own large language model.pptx
How to fine-tune and develop your own large language model.pptxHow to fine-tune and develop your own large language model.pptx
How to fine-tune and develop your own large language model.pptx
 
Pre trained language model
Pre trained language modelPre trained language model
Pre trained language model
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language Processing
 
Data Quality for Machine Learning Tasks
Data Quality for Machine Learning TasksData Quality for Machine Learning Tasks
Data Quality for Machine Learning Tasks
 
Information Retrieval using Semantic Similarity
Information Retrieval using Semantic SimilarityInformation Retrieval using Semantic Similarity
Information Retrieval using Semantic Similarity
 
Neural Architectures for Named Entity Recognition
Neural Architectures for Named Entity RecognitionNeural Architectures for Named Entity Recognition
Neural Architectures for Named Entity Recognition
 
Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...
 
Introduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaIntroduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga Petrova
 
Label propagation - Semisupervised Learning with Applications to NLP
Label propagation - Semisupervised Learning with Applications to NLPLabel propagation - Semisupervised Learning with Applications to NLP
Label propagation - Semisupervised Learning with Applications to NLP
 
Text Classification
Text ClassificationText Classification
Text Classification
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Attention Is All You Need
Attention Is All You NeedAttention Is All You Need
Attention Is All You Need
 
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
 
Bert
BertBert
Bert
 
Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...
Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...
Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...
 
Introduction to Transformer Model
Introduction to Transformer ModelIntroduction to Transformer Model
Introduction to Transformer Model
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language Processing
 
Explainable AI (XAI) - A Perspective
Explainable AI (XAI) - A Perspective Explainable AI (XAI) - A Perspective
Explainable AI (XAI) - A Perspective
 
Topic Modeling
Topic ModelingTopic Modeling
Topic Modeling
 

Similar to Biomedical Entity Linking - Introduction, approaches, challenges

Introduction to NTCIR 2016 MedNLPDoc
Introduction to NTCIR 2016 MedNLPDocIntroduction to NTCIR 2016 MedNLPDoc
Introduction to NTCIR 2016 MedNLPDocYu Liu
 
Dynamic Semantic Metadata in Biomedical Communications
Dynamic Semantic Metadata in Biomedical CommunicationsDynamic Semantic Metadata in Biomedical Communications
Dynamic Semantic Metadata in Biomedical CommunicationsTim Clark
 
Recent Advances in Deep Learning Techniques for Electronic Health Record
Recent Advances in Deep Learning Techniques for Electronic Health RecordRecent Advances in Deep Learning Techniques for Electronic Health Record
Recent Advances in Deep Learning Techniques for Electronic Health Recordkingstdio
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016Anita de Waard
 
Disease detection for multilabel big dataset using MLAM, Naive Bayes, Adaboos...
Disease detection for multilabel big dataset using MLAM, Naive Bayes, Adaboos...Disease detection for multilabel big dataset using MLAM, Naive Bayes, Adaboos...
Disease detection for multilabel big dataset using MLAM, Naive Bayes, Adaboos...IRJET Journal
 
Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...
Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...
Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...Benjamin Good
 
Big data and machine learning: opportunità per la medicina di precisione e i ...
Big data and machine learning: opportunità per la medicina di precisione e i ...Big data and machine learning: opportunità per la medicina di precisione e i ...
Big data and machine learning: opportunità per la medicina di precisione e i ...Fondazione Giannino Bassetti
 
Ontology Engineering for Big Data
Ontology Engineering for Big DataOntology Engineering for Big Data
Ontology Engineering for Big DataKouji Kozaki
 
Strengths and Weakness of Informatics.docx
Strengths and Weakness of Informatics.docxStrengths and Weakness of Informatics.docx
Strengths and Weakness of Informatics.docxwrite5
 
Knowledge Science for AI-based biomedical and clinical applications
Knowledge Science for AI-based biomedical and clinical applicationsKnowledge Science for AI-based biomedical and clinical applications
Knowledge Science for AI-based biomedical and clinical applicationsCatia Pesquita
 
Understanding medical concepts and codes through NLP methods
Understanding medical concepts and codes through NLP methodsUnderstanding medical concepts and codes through NLP methods
Understanding medical concepts and codes through NLP methodsAshis Chanda
 
ALA 2010 -- Jabin White
ALA 2010 -- Jabin WhiteALA 2010 -- Jabin White
ALA 2010 -- Jabin Whitebisg
 
Extreme scale text based classification of medical data
Extreme scale text based classification of medical dataExtreme scale text based classification of medical data
Extreme scale text based classification of medical dataSvetlaBoytcheva
 
DSS Ontotext Webinar -Examode: Extreme-scale text-based classification of med...
DSS Ontotext Webinar -Examode: Extreme-scale text-based classification of med...DSS Ontotext Webinar -Examode: Extreme-scale text-based classification of med...
DSS Ontotext Webinar -Examode: Extreme-scale text-based classification of med...SvetlaBoytcheva
 
Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014Mark Wilkinson
 
Aleksandar Zivaljevic - Annotation of clinical datasets using openEHR Archety...
Aleksandar Zivaljevic - Annotation of clinical datasets using openEHR Archety...Aleksandar Zivaljevic - Annotation of clinical datasets using openEHR Archety...
Aleksandar Zivaljevic - Annotation of clinical datasets using openEHR Archety...Health Informatics New Zealand
 
Big Data & ML for Clinical Data
Big Data & ML for Clinical DataBig Data & ML for Clinical Data
Big Data & ML for Clinical DataPaul Agapow
 
II-SDV 2013 Text Mining at Work: Critical Assessment of the Completeness and ...
II-SDV 2013 Text Mining at Work: Critical Assessment of the Completeness and ...II-SDV 2013 Text Mining at Work: Critical Assessment of the Completeness and ...
II-SDV 2013 Text Mining at Work: Critical Assessment of the Completeness and ...Dr. Haxel Consult
 
ai-in-healthcare-202011-201117103639.pptx
ai-in-healthcare-202011-201117103639.pptxai-in-healthcare-202011-201117103639.pptx
ai-in-healthcare-202011-201117103639.pptxssuser6b571f
 
G. Poste. Managing the Data Deluge: Critical Issues in the Integration and An...
G. Poste. Managing the Data Deluge: Critical Issues in the Integration and An...G. Poste. Managing the Data Deluge: Critical Issues in the Integration and An...
G. Poste. Managing the Data Deluge: Critical Issues in the Integration and An...CASI, Arizona State University
 

Similar to Biomedical Entity Linking - Introduction, approaches, challenges (20)

Introduction to NTCIR 2016 MedNLPDoc
Introduction to NTCIR 2016 MedNLPDocIntroduction to NTCIR 2016 MedNLPDoc
Introduction to NTCIR 2016 MedNLPDoc
 
Dynamic Semantic Metadata in Biomedical Communications
Dynamic Semantic Metadata in Biomedical CommunicationsDynamic Semantic Metadata in Biomedical Communications
Dynamic Semantic Metadata in Biomedical Communications
 
Recent Advances in Deep Learning Techniques for Electronic Health Record
Recent Advances in Deep Learning Techniques for Electronic Health RecordRecent Advances in Deep Learning Techniques for Electronic Health Record
Recent Advances in Deep Learning Techniques for Electronic Health Record
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016
 
Disease detection for multilabel big dataset using MLAM, Naive Bayes, Adaboos...
Disease detection for multilabel big dataset using MLAM, Naive Bayes, Adaboos...Disease detection for multilabel big dataset using MLAM, Naive Bayes, Adaboos...
Disease detection for multilabel big dataset using MLAM, Naive Bayes, Adaboos...
 
Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...
Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...
Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...
 
Big data and machine learning: opportunità per la medicina di precisione e i ...
Big data and machine learning: opportunità per la medicina di precisione e i ...Big data and machine learning: opportunità per la medicina di precisione e i ...
Big data and machine learning: opportunità per la medicina di precisione e i ...
 
Ontology Engineering for Big Data
Ontology Engineering for Big DataOntology Engineering for Big Data
Ontology Engineering for Big Data
 
Strengths and Weakness of Informatics.docx
Strengths and Weakness of Informatics.docxStrengths and Weakness of Informatics.docx
Strengths and Weakness of Informatics.docx
 
Knowledge Science for AI-based biomedical and clinical applications
Knowledge Science for AI-based biomedical and clinical applicationsKnowledge Science for AI-based biomedical and clinical applications
Knowledge Science for AI-based biomedical and clinical applications
 
Understanding medical concepts and codes through NLP methods
Understanding medical concepts and codes through NLP methodsUnderstanding medical concepts and codes through NLP methods
Understanding medical concepts and codes through NLP methods
 
ALA 2010 -- Jabin White
ALA 2010 -- Jabin WhiteALA 2010 -- Jabin White
ALA 2010 -- Jabin White
 
Extreme scale text based classification of medical data
Extreme scale text based classification of medical dataExtreme scale text based classification of medical data
Extreme scale text based classification of medical data
 
DSS Ontotext Webinar -Examode: Extreme-scale text-based classification of med...
DSS Ontotext Webinar -Examode: Extreme-scale text-based classification of med...DSS Ontotext Webinar -Examode: Extreme-scale text-based classification of med...
DSS Ontotext Webinar -Examode: Extreme-scale text-based classification of med...
 
Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014
 
Aleksandar Zivaljevic - Annotation of clinical datasets using openEHR Archety...
Aleksandar Zivaljevic - Annotation of clinical datasets using openEHR Archety...Aleksandar Zivaljevic - Annotation of clinical datasets using openEHR Archety...
Aleksandar Zivaljevic - Annotation of clinical datasets using openEHR Archety...
 
Big Data & ML for Clinical Data
Big Data & ML for Clinical DataBig Data & ML for Clinical Data
Big Data & ML for Clinical Data
 
II-SDV 2013 Text Mining at Work: Critical Assessment of the Completeness and ...
II-SDV 2013 Text Mining at Work: Critical Assessment of the Completeness and ...II-SDV 2013 Text Mining at Work: Critical Assessment of the Completeness and ...
II-SDV 2013 Text Mining at Work: Critical Assessment of the Completeness and ...
 
ai-in-healthcare-202011-201117103639.pptx
ai-in-healthcare-202011-201117103639.pptxai-in-healthcare-202011-201117103639.pptx
ai-in-healthcare-202011-201117103639.pptx
 
G. Poste. Managing the Data Deluge: Critical Issues in the Integration and An...
G. Poste. Managing the Data Deluge: Critical Issues in the Integration and An...G. Poste. Managing the Data Deluge: Critical Issues in the Integration and An...
G. Poste. Managing the Data Deluge: Critical Issues in the Integration and An...
 

Recently uploaded

Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfRobertoOcampo24
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...ssuserf63bd7
 
社内勉強会資料  Mamba - A new era or ephemeral
社内勉強会資料   Mamba - A new era or ephemeral社内勉強会資料   Mamba - A new era or ephemeral
社内勉強会資料  Mamba - A new era or ephemeralNABLAS株式会社
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdfvyankatesh1
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Jon Hansen
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理pyhepag
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Calllward7
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理cyebo
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxDilipVasan
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxStephen266013
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理pyhepag
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfscitechtalktv
 
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdfGenerative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdfEmmanuel Dauda
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictJack Cole
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...Amil baba
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfMichaelSenkow
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp onlinebalibahu1313
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group MeetingAlison Pitt
 

Recently uploaded (20)

Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdf
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
 
社内勉強会資料  Mamba - A new era or ephemeral
社内勉強会資料   Mamba - A new era or ephemeral社内勉強会資料   Mamba - A new era or ephemeral
社内勉強会資料  Mamba - A new era or ephemeral
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdf
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)
 
Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp ClaimsSlip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptx
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdf
 
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdfGenerative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdf
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp online
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting
 

Biomedical Entity Linking - Introduction, approaches, challenges

  • 2. About me ● PhD in machine learning & natural language processing from University of Bonn & Fraunhofer IAIS ● Now in industry: AI and data driven products, since 2016 mostly in the medical and healthcare domain ● Main interests: NLP, especially German; information retrieval; recommender systems @anja_pilz aplz
  • 3. Outline ● Motivation: why do we need entity linking ○ Ambiguity, use cases ● Entity linking in the biomedical domain ○ Data and ontologies, main challenges ● Technical problem: 3 stage task ○ Approaches for each of these stages (sketches & references) ● Short preview of challenges with German data
  • 4. Language is ambiguous “with steroid induced diabetes, I lost a stone in three days, it was grim” Type II diabetes Type I diabetes Gestational diabetes Steroid diabetes gallstones kidney stones the stone (unit) a random stone grim protein, Drosophila variants of steroids
  • 5. Why do we care? Need to resolve ambiguity to ● avoid mistakes in patient-doctor communication ○ specialist vs layman vocabulary ● automatically retrieve important information ○ side effects of drugs discussed in online patient fora ● enrich electronic health records (EHR) ○ links to newest research, treatment guidelines or other LOD resources And many more reasons… Entity linking resolves ambiguity by assigning each mention its underlying “sense”.
  • 6. Headache Cephalgia Entities: entries in (curated), medical ontologiesMentions: textual references of medical terms like diagnoses, treatments, body parts, drugs, ... Biomedical Entity Linking Migraine Head Pain Cranial Pain Headache (D006261) layman terms EHR specialist vocabulary
  • 7. Example: excerpt from a PubMed abstract linked to UMLS (Unified Medical Language System) Biomedical Entity Linking Mohan & Li, MedMentions: A Large Biomedical Corpus Annotated with UMLS Concepts, AKBC 2019 The technique does not require contrast material, so it can safely be used in patients with renal failure.
  • 8. Why is that hard? ● Notion of uniqueness: a disease is rendered unique by the person it affects (and the stage) ● Uniqueness heavily affects linkability: which stage of renal failure is meant? ○ candidates “look” super similar ○ might even need additional resources (lab) Acute renal failure: Her baseline Cr is 1.8. On presentation the Cr had increased to 7.7 secondary to the bilateral hydronephrosis. https://icd.who.int/browse11/l-m/en Johnson et al., MIMIC-III, a freely accessible critical care database. Scientific Data 2016
  • 9. Given some text document, find all spans of words m that mention some entity e and assign each span to a unique identifier (entry in a KB). Technical Problem Entity Recognition: detect spans to be linked (Sequence Tagging) Candidate Retrieval: find all relevant candidates in a KB (Information Retrieval) Candidate Ranking: decide on the best candidate (Ranking Task) Errorpropagation
  • 10. Step 1: Entity Recognition Goal: detect diagnoses, measurements, procedures in the text of the EHR ● supervised: train a sequence tagging model ○ pick a model: lots of literature but mostly sth Bi-LSTM CRF ○ (manually) annotate data ● pro: domain adaptation & custom features ● con: requires training data & medical expertise Roller et al., Detecting Named Entities and Relations in German Clinical Reports, GSCL 2017 Murty et al., Hierarchical Losses and New Resources for Fine-grained Entity Typing and Linking, ACL 2018 Lampe et al., Neural Architectures for Named Entity Recognition. NAACL-HLT 2016 Indication: Acute hypoxia. Relapsed AML, GVHD, and renal failure with new hypoxia with clear chest x-ray.
  • 11. Step 1: Entity Recognition Goal: detect diagnoses, measurements, procedures in the text of the EHR ● weakly labeled: keyword matching ○ walk over text and lookup every span in a dictionary ○ keep all spans that have at least one entity candidate ● pro: no need to annotate data ● con: noise, type and recall issues Murty et al., Hierarchical Losses and New Resources for Fine-grained Entity Typing and Linking, ACL 2018 Kolitsas et al., End-to-End Neural Entity Linking, CoNLL 2018 Wiatrak, Iso-Sipilä. Simple Hierarchical Multi-Task Neural End-To-End Entity Linking for Biomedical Text. LOUHI@EMNLP 2020 Indication: Acute hypoxia. Relapsed AML, GVHD, and renal failure with new hypoxia with clear chest x-ray.
  • 12. Step 2: Candidate Retrieval Goal: fetch all relevant candidate entities from the ontology ● upper bound on performance: you can’t link what you don’t find GoTo solution: inverted index (lucene) over entity descriptions ● make use of the analyzers coming with lucene for tokenization, stemming, etc ● craft search query from the mention context ● keep top 5, 10, 100 hits as candidates Pilz & Paaß, Collective Search for Concept Disambiguation, COLING 2012
  • 13. Step 3: Candidate Ranking Goal: decide on the best candidate as target entity Rank by context similarity ● compare text representations of mention context and entity description (word2vec, topic distributions, etc) ● but: medical ontologies do often not provide extensive descriptions Pilz & Paaß, From names to entities using thematic context distance, CIKM 2011
  • 14. Step 3: Candidate Ranking Goal: decide on the best candidate as target entity Add type similarity from hierarchies ● Wikipedia: categories assigned to entities ● UMLS: use semantic types ○ distinguish disease form the gene its caused by ○ LATTE: find boost in linking performance when adding type encoding learned from UMLS types Zhu et al., LATTE: Latent Type Modeling for Biomedical Entity Linking, AAAI 2020 UMLS® Reference Manual
  • 15. Step 3: Candidate Ranking Goal: decide on the best candidate as target entity In a nutshell ● find expressive vector representations of mention-candidate pairs ● plug vectors into some function to rank them ○ Ranking SVM, specific loss functions in NN, … ● the information in the vector is more important than the algorithm!
  • 16. Challenges with German data ● Data is scarce, nothing comparable to MIMIC-III or MedMentions exists ● Ontologies like UMLS are only available in English ● NLP for German is a tad harder ○ Common nouns look like named entities (upper case) ● … the notorious compound words ○ sensory sensation disorder: Schallempfindungsstörung ○ occlusion of the central retinal artery: Netzhautarterienverschluss Ideas? Let’s discuss!