SlideShare a Scribd company logo
National Center for Scientific Research
“Demokritos” Athens
• Anastasios Nentidis
• Anastasia Krithara
• Georgios Katsimpras
• Georgios Paliouras
Barcelona Supercomputing Center (BSC):
• Antonio Miranda-Escalada
• Luis Gascó
• Salvador Lima-López
• Eulàlia Farré-Maduell
• Darryl Estrada
• Martin Krallinger
Overview of DisTEMIST at BioASQ:
Automatic detection and
normalization of diseases from
clinical texts: results, methods,
evaluation and multilingual
resources
BioASQ @ CLEF
DisTEMIST corpus: doi.org/10.5281/zenodo.6408476
1
BioASQ - DisTEMIST: DISease TExt Mining Shared Task - krallinger.martin@gmail.com; antoniomiresc@gmail.com
Importance of disease NER/Text Mining
De − Allice Hunter - File:Hispanophone global world map language.png,CC BY-SA 4.0,
https://commons.wikimedia.org/w/index.php?curid=69323596
Snomed CT
● Relevant for health structured data: clinical coding or indexing with “Diagnosis” codes (ICD-10)
● Relevant search query/IR: 20% of PubMed queries are related to diseases, disorders, and
anomalies (2nd most popular search after authors)
● Relevant to clinical use cases and IE: etiology (e.g. gene–disease relationships), clinical aspects
(diagnosis, prevention and treatment), etc.
● Key concept across multiple data sources/types: literature , clinical records, clinical trials,
patents, or social media (e.g. SMM4H/COLING2022 track - SocialDisNER 1)
(temu.bsc.es/socialdisner/)
Spanish
BioASQ - DisTEMIST: DISease TExt Mining Shared Task - krallinger.martin@gmail.com; antoniomiresc@gmail.com
Previous disease extraction & normalization efforts
2022
< 2010 2010 2013 2015 2017 2019 2020 2021
● i2b2/VA [Disease
mention corpus]
● IxaMed-GS [Disease mention
corpus in Spanish]
● DrugSemantics corpus [Disease
mention corpus in Spanish]
● CodiEsp [Diagnosis mention corpus
normalized to ICD-10 in Spanish]
● Campillos-Llanos corpus [Disease
corpus partially normalized to
UMLS in Spanish]
● UMLS Metathesaurus creation
[Terminological resource] [1993]
● Snomed-CT creation [Terminological
resource] [1999]
● MetaMap [Tool] [2001]
● NCBI-Disease Corpus [Disease mention corpus
normalized to MeSH and OMIM]
● ShARe/CLEF eHealth [Disease corpus
normalized to SNOMED CT]
● DNorm [Disease finding and normalization
Tool]
● i2b2/VA partial normalization
[Disease mention corpus
normalized to SNOMED CT and
RxNorm]
● DisTEMIST
BioASQ - DisTEMIST: DISease TExt Mining Shared Task - krallinger.martin@gmail.com; antoniomiresc@gmail.com
DisTEMIST/BioASQ track overview
BioASQ - DisTEMIST: DISease TExt Mining Shared Task - krallinger.martin@gmail.com; antoniomiresc@gmail.com
DisTEMIST resources
Webpage: temu.bsc.es/distemist
DisTEMIST corpus: doi.org/10.5281/zenodo.6408476
DisTEMIST annotation guidelines: doi.org/10.5281/zenodo.6458078
DisTEMIST Multilingual Silver Standard: doi.org/10.5281/zenodo.6408476
DisTEMIST entity normalization cross-mappings: doi.org/10.5281/zenodo.6408476
DisTEMIST gazetteer: doi.org/10.5281/zenodo.6458114
DisTEMIST Silver Standard: TBP
DisTEMIST evaluation library:
github.com/TeMU-BSC/distemist_evaluation_library
DisTEMIST participant systems:
temu.bsc.es/distemist/participant-systems/
DisTEMIST YouTube playlist: tinyurl.com/distemist-playlist
BioASQ - DisTEMIST: DISease TExt Mining Shared Task - krallinger.martin@gmail.com; antoniomiresc@gmail.com
DisTEMIST Corpus - data, format & annotation
BioASQ - DisTEMIST: DISease TExt Mining Shared Task - krallinger.martin@gmail.com; antoniomiresc@gmail.com
DisTEMIST Corpus - Overview
● Manual entity mention labels/annotations
● Manual normalization or mapping to SNOMED CT codes
● Inter-Annotator Agreement (IAA): 82.3
● Random training and test split
Most common disease mentions
Manual
BioASQ - DisTEMIST: DISease TExt Mining Shared Task - krallinger.martin@gmail.com; antoniomiresc@gmail.com
DisTEMIST Multilingual Silver Standard
7 languages:
• English
• Italian
• Portuguese
• French
• Romanian
• Catalan
• Galician
BioASQ - DisTEMIST: DISease TExt Mining Shared Task - krallinger.martin@gmail.com; antoniomiresc@gmail.com
DisTEMIST Multilingual Silver Standard
Spanish Gold Standard English Silver Standard
Snomed ID:
416098002
Snomed ID:
416098002
Snomed ID:
13644009
Snomed ID:
13644009
Online side-by side browser (BRAT): temu.bsc.es/mDistemist/diff.xhtml#/translations/en/train/S0004-06142005000500011-1?diff=/gold-standard/train/
BioASQ - DisTEMIST: DISease TExt Mining Shared Task - krallinger.martin@gmail.com; antoniomiresc@gmail.com
DisTEMIST participating teams
● Registrations: 159
● DisTEMIST-entities: 9 participating teams, 19 submissions
● DisTEMIST-linking: 7 teams, 13 submissions
BioASQ - DisTEMIST: DISease TExt Mining Shared Task - krallinger.martin@gmail.com; antoniomiresc@gmail.com
DisTEMIST participant results
● MiF: micro-averaged F-score (main metric)
● MiP: micro-avg. Precision
● MiR: micro-avg. Recall
https://github.com/TeMU-BSC/distemist_evaluation_library
BioASQ - DisTEMIST: DISease TExt Mining Shared Task - krallinger.martin@gmail.com; antoniomiresc@gmail.com
DisTEMIST participant results
● MiF: micro-averaged F-score (main metric)
● MiP: micro-avg. Precision
● MiR: micro-avg. Recall
https://github.com/TeMU-BSC/distemist_evaluation_library
BioASQ - DisTEMIST: DISease TExt Mining Shared Task - krallinger.martin@gmail.com; antoniomiresc@gmail.com
DisTEMIST top-performing teams methods
● Domain and language specific
transformers-based LM
● Simple decoding layer: linear layer
● Two step-process: candidate
generation & candidate ranking
● 2 dictionaries: DisTEMIST
Gazetteer & UMLS
● 2 candidate generation
engines based on different
technologies
● Smart candidate combination
using weights and thresholds
E PICUSLab
L HPI-DHC
Borchert et al. HPI-DHC @ BioASQ DisTEMIST: Spanish
Biomedical Entity Linking with Pre-trained Transformers
and Cross-lingual Candidate Retrieval
Moscato et al. Biomedical Spanish Language
Models for entity recognition and linking at
BioASQ DisTEMIST
DISTEMIST-entities
DISTEMIST-linking
Biomedical Spanish RoBERTa
Tokenized input sentence
Linear Token Classification Layer
O
El paciente … Sarcoma de Kaposi …
O B-ENF I-ENF
… I-ENF …
BioASQ - DisTEMIST: DISease TExt Mining Shared Task - krallinger.martin@gmail.com; antoniomiresc@gmail.com
DisTEMIST Spanish Silver Standard
Definition: DisTEMIST participants made predictions for the test set + 2750 background set documents
(oncology, cardiology, urology, nefrology, etc).
Harmonize background set predictions → DisTEMIST Spanish Silver Standard corpus.
Statistics: 615235 disease predictions (146396 unique)
Usage: the DisTEMIST Spanish Silver Standard is useful to improve the deep learning systems (shown in table
below), as well as to create a disease Knowledge Graph.
System improvement using
Silver Standard data!
Documents Mentions Unique mentions Tokens
2,750 615,235 146,396 1,206,711
BioASQ - DisTEMIST: DISease TExt Mining Shared Task - krallinger.martin@gmail.com; antoniomiresc@gmail.com
• Increasing participation in the Spanish task of BioASQ
(5 [2020], 7 [2021], 9 [2022],…)
• Increasing interest in Spanish clinical NLP tasks
• DisTEMIST Resources
○ DisTEMIST Corpus: Disease entity Gold Standard corpus normalised to SNOMED-CT.
○ DisTEMIST Multilingual Silver Standard Corpus: Disease entity corpora normalised
to SNOMED-CT in several languages.
○ DisTEMIST Spanish Silver Standard (from participants’ predictions)
• Disease Entity Recognition difficulties:
○ Complex entities (4%)
○ Heterogeneous diseases
○ Mention length: e.g. "tortuosidad de tronco celíaco y arteria hepática" (tortuosity of
the celiac trunk and hepatic artery)
Conclusions
BioASQ - DisTEMIST: DISease TExt Mining Shared Task - krallinger.martin@gmail.com; antoniomiresc@gmail.com
•Expand the DisTEMIST Multilingual Silver Standard to other languages
•Correct the DisTEMIST Multilingual Silver Standard to generate a Gold Standard subset
of each language to create high-quality benchmarks in the seven languages.
•Prepare the same Gold Standard corpus for other medical entities (symptoms and
medical procedures).
Future directions
BioASQ - DisTEMIST: DISease TExt Mining Shared Task - krallinger.martin@gmail.com; antoniomiresc@gmail.com
● Pablo Serrano
Acknowledgements
DisTEMIST Participants &
DisTEMIST Scientific Committee
BioASQ organisers
● Anastasios Nentidis
● Georgios Katsimpras
● Anastasia Krithara
● Georgios Paliouras
CLEF organisers
Funding:
• Plan de Tecnologías del Lenguaje
• AI4PROFHEALTH (PID2020-119266RA-I00)
• BioMATDB Horizon Europe Grant
Agreement No 101058779 Hospital 12 de Octubre
BSC Text Mining Unit
BioASQ - DisTEMIST: DISease TExt Mining Shared Task - krallinger.martin@gmail.com; antoniomiresc@gmail.com
DisTEMIST resources
Webpage: temu.bsc.es/distemist
DisTEMIST corpus: doi.org/10.5281/zenodo.6408476
DisTEMIST annotation guidelines: doi.org/10.5281/zenodo.6458078
DisTEMIST Multilingual Silver Standard: doi.org/10.5281/zenodo.6408476
DisTEMIST cross-mappings: doi.org/10.5281/zenodo.6408476
DisTEMIST gazetteer: doi.org/10.5281/zenodo.6458114
DisTEMIST Silver Standard: TBP
DisTEMIST evaluation library:
github.com/TeMU-BSC/distemist_evaluation_library
DisTEMIST participant systems:
temu.bsc.es/distemist/participant-systems/
DisTEMIST YouTube playlist: tinyurl.com/distemist-playlist
BioASQ - DisTEMIST: DISease TExt Mining Shared Task - krallinger.martin@gmail.com; antoniomiresc@gmail.com
Thank you!
Questions?

More Related Content

Similar to Overview of DisTEMIST at BioASQ: Automatic detection and normalization of diseases from clinical texts: results, methods, evaluation and multilingual resources (at BioASQ workshop of CLEF conference)

Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
Bioinformatics and Computational Biosciences Branch
 
Basics of Research Data Management
Basics of Research Data ManagementBasics of Research Data Management
Basics of Research Data Management
OpenAIRE
 
HL7 Synthetic CDA generator -Final presentation
HL7 Synthetic CDA generator -Final presentationHL7 Synthetic CDA generator -Final presentation
HL7 Synthetic CDA generator -Final presentation
Nikhil Kassetty
 
PubChem for drug discovery in the age of big data and artificial intelligence
PubChem for drug discovery in the age of big data and artificial intelligencePubChem for drug discovery in the age of big data and artificial intelligence
PubChem for drug discovery in the age of big data and artificial intelligence
Sunghwan Kim
 
Neo4j Demo: Using Knowledge Graphs to Classify Diabetes Patients (GlaxoSmithK...
Neo4j Demo: Using Knowledge Graphs to Classify Diabetes Patients (GlaxoSmithK...Neo4j Demo: Using Knowledge Graphs to Classify Diabetes Patients (GlaxoSmithK...
Neo4j Demo: Using Knowledge Graphs to Classify Diabetes Patients (GlaxoSmithK...
Neo4j
 
Health Datapalooza 2013: Open Government Data - Natasha Noy
Health Datapalooza 2013: Open Government Data - Natasha NoyHealth Datapalooza 2013: Open Government Data - Natasha Noy
Health Datapalooza 2013: Open Government Data - Natasha Noy
Health Data Consortium
 
Microfluidic technologies for diagnostic applications - Sample
Microfluidic technologies for diagnostic applications - SampleMicrofluidic technologies for diagnostic applications - Sample
Microfluidic technologies for diagnostic applications - Sample
Knowmade
 
MICRE: Microservices In MediCal Research Environments
MICRE: Microservices In MediCal Research EnvironmentsMICRE: Microservices In MediCal Research Environments
MICRE: Microservices In MediCal Research Environments
Martin Chapman
 
PhD dissertation Luis Marco Ruiz
PhD dissertation Luis Marco RuizPhD dissertation Luis Marco Ruiz
PhD dissertation Luis Marco Ruiz
Luis Marco Ruiz
 
CDISC-CDASH
CDISC-CDASHCDISC-CDASH
CDISC-CDASH
Gowthami6789
 
MEDIN Discovery Metadata Standard
MEDIN Discovery Metadata StandardMEDIN Discovery Metadata Standard
Imaging evolution-himss-middle east09-v5.0
Imaging evolution-himss-middle east09-v5.0Imaging evolution-himss-middle east09-v5.0
Imaging evolution-himss-middle east09-v5.0
Rex Osborn
 
ComRAD via ANAMNESIS
ComRAD via ANAMNESISComRAD via ANAMNESIS
ComRAD via ANAMNESIS
vitalig
 
ComRAD Via ANAMNESIS
ComRAD Via ANAMNESISComRAD Via ANAMNESIS
ComRAD Via ANAMNESIS
vitalig
 
CSU-ACADIS_dataManagement101-20120217
CSU-ACADIS_dataManagement101-20120217CSU-ACADIS_dataManagement101-20120217
CSU-ACADIS_dataManagement101-20120217
lyarmey
 
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
Ilkay Altintas, Ph.D.
 
How novel compute technology transforms life science research
How novel compute technology transforms life science researchHow novel compute technology transforms life science research
How novel compute technology transforms life science research
Denis C. Bauer
 
Rescuing Data from Decaying and Moribund Clinical Information Systems
Rescuing Data from Decaying and Moribund Clinical Information SystemsRescuing Data from Decaying and Moribund Clinical Information Systems
Rescuing Data from Decaying and Moribund Clinical Information Systems
Health Informatics New Zealand
 
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekGenomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Data Driven Innovation
 
San diego-supercomputing-sc17-user-group
San diego-supercomputing-sc17-user-groupSan diego-supercomputing-sc17-user-group
San diego-supercomputing-sc17-user-group
inside-BigData.com
 

Similar to Overview of DisTEMIST at BioASQ: Automatic detection and normalization of diseases from clinical texts: results, methods, evaluation and multilingual resources (at BioASQ workshop of CLEF conference) (20)

Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
 
Basics of Research Data Management
Basics of Research Data ManagementBasics of Research Data Management
Basics of Research Data Management
 
HL7 Synthetic CDA generator -Final presentation
HL7 Synthetic CDA generator -Final presentationHL7 Synthetic CDA generator -Final presentation
HL7 Synthetic CDA generator -Final presentation
 
PubChem for drug discovery in the age of big data and artificial intelligence
PubChem for drug discovery in the age of big data and artificial intelligencePubChem for drug discovery in the age of big data and artificial intelligence
PubChem for drug discovery in the age of big data and artificial intelligence
 
Neo4j Demo: Using Knowledge Graphs to Classify Diabetes Patients (GlaxoSmithK...
Neo4j Demo: Using Knowledge Graphs to Classify Diabetes Patients (GlaxoSmithK...Neo4j Demo: Using Knowledge Graphs to Classify Diabetes Patients (GlaxoSmithK...
Neo4j Demo: Using Knowledge Graphs to Classify Diabetes Patients (GlaxoSmithK...
 
Health Datapalooza 2013: Open Government Data - Natasha Noy
Health Datapalooza 2013: Open Government Data - Natasha NoyHealth Datapalooza 2013: Open Government Data - Natasha Noy
Health Datapalooza 2013: Open Government Data - Natasha Noy
 
Microfluidic technologies for diagnostic applications - Sample
Microfluidic technologies for diagnostic applications - SampleMicrofluidic technologies for diagnostic applications - Sample
Microfluidic technologies for diagnostic applications - Sample
 
MICRE: Microservices In MediCal Research Environments
MICRE: Microservices In MediCal Research EnvironmentsMICRE: Microservices In MediCal Research Environments
MICRE: Microservices In MediCal Research Environments
 
PhD dissertation Luis Marco Ruiz
PhD dissertation Luis Marco RuizPhD dissertation Luis Marco Ruiz
PhD dissertation Luis Marco Ruiz
 
CDISC-CDASH
CDISC-CDASHCDISC-CDASH
CDISC-CDASH
 
MEDIN Discovery Metadata Standard
MEDIN Discovery Metadata StandardMEDIN Discovery Metadata Standard
MEDIN Discovery Metadata Standard
 
Imaging evolution-himss-middle east09-v5.0
Imaging evolution-himss-middle east09-v5.0Imaging evolution-himss-middle east09-v5.0
Imaging evolution-himss-middle east09-v5.0
 
ComRAD via ANAMNESIS
ComRAD via ANAMNESISComRAD via ANAMNESIS
ComRAD via ANAMNESIS
 
ComRAD Via ANAMNESIS
ComRAD Via ANAMNESISComRAD Via ANAMNESIS
ComRAD Via ANAMNESIS
 
CSU-ACADIS_dataManagement101-20120217
CSU-ACADIS_dataManagement101-20120217CSU-ACADIS_dataManagement101-20120217
CSU-ACADIS_dataManagement101-20120217
 
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
 
How novel compute technology transforms life science research
How novel compute technology transforms life science researchHow novel compute technology transforms life science research
How novel compute technology transforms life science research
 
Rescuing Data from Decaying and Moribund Clinical Information Systems
Rescuing Data from Decaying and Moribund Clinical Information SystemsRescuing Data from Decaying and Moribund Clinical Information Systems
Rescuing Data from Decaying and Moribund Clinical Information Systems
 
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekGenomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
 
San diego-supercomputing-sc17-user-group
San diego-supercomputing-sc17-user-groupSan diego-supercomputing-sc17-user-group
San diego-supercomputing-sc17-user-group
 

Recently uploaded

一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
asyed10
 
Drownings spike from May to August in children
Drownings spike from May to August in childrenDrownings spike from May to August in children
Drownings spike from May to August in children
Bisnar Chase Personal Injury Attorneys
 
Data Scientist Machine Learning Profiles .pdf
Data Scientist Machine Learning  Profiles .pdfData Scientist Machine Learning  Profiles .pdf
Data Scientist Machine Learning Profiles .pdf
Vineet
 
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
eudsoh
 
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative ClassifiersML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
MastanaihnaiduYasam
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
eoxhsaa
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
Sample Devops SRE Product Companies .pdf
Sample Devops SRE  Product Companies .pdfSample Devops SRE  Product Companies .pdf
Sample Devops SRE Product Companies .pdf
Vineet
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
dataschool1
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
Alireza Kamrani
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
vasanthatpuram
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
ytypuem
 
一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理
ugydym
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
nyvan3
 
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
osoyvvf
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 

Recently uploaded (20)

一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
 
Drownings spike from May to August in children
Drownings spike from May to August in childrenDrownings spike from May to August in children
Drownings spike from May to August in children
 
Data Scientist Machine Learning Profiles .pdf
Data Scientist Machine Learning  Profiles .pdfData Scientist Machine Learning  Profiles .pdf
Data Scientist Machine Learning Profiles .pdf
 
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
 
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative ClassifiersML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
Sample Devops SRE Product Companies .pdf
Sample Devops SRE  Product Companies .pdfSample Devops SRE  Product Companies .pdf
Sample Devops SRE Product Companies .pdf
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
 
一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
 
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 

Overview of DisTEMIST at BioASQ: Automatic detection and normalization of diseases from clinical texts: results, methods, evaluation and multilingual resources (at BioASQ workshop of CLEF conference)

  • 1. National Center for Scientific Research “Demokritos” Athens • Anastasios Nentidis • Anastasia Krithara • Georgios Katsimpras • Georgios Paliouras Barcelona Supercomputing Center (BSC): • Antonio Miranda-Escalada • Luis Gascó • Salvador Lima-López • Eulàlia Farré-Maduell • Darryl Estrada • Martin Krallinger Overview of DisTEMIST at BioASQ: Automatic detection and normalization of diseases from clinical texts: results, methods, evaluation and multilingual resources BioASQ @ CLEF DisTEMIST corpus: doi.org/10.5281/zenodo.6408476 1
  • 2. BioASQ - DisTEMIST: DISease TExt Mining Shared Task - krallinger.martin@gmail.com; antoniomiresc@gmail.com Importance of disease NER/Text Mining De − Allice Hunter - File:Hispanophone global world map language.png,CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=69323596 Snomed CT ● Relevant for health structured data: clinical coding or indexing with “Diagnosis” codes (ICD-10) ● Relevant search query/IR: 20% of PubMed queries are related to diseases, disorders, and anomalies (2nd most popular search after authors) ● Relevant to clinical use cases and IE: etiology (e.g. gene–disease relationships), clinical aspects (diagnosis, prevention and treatment), etc. ● Key concept across multiple data sources/types: literature , clinical records, clinical trials, patents, or social media (e.g. SMM4H/COLING2022 track - SocialDisNER 1) (temu.bsc.es/socialdisner/) Spanish
  • 3. BioASQ - DisTEMIST: DISease TExt Mining Shared Task - krallinger.martin@gmail.com; antoniomiresc@gmail.com Previous disease extraction & normalization efforts 2022 < 2010 2010 2013 2015 2017 2019 2020 2021 ● i2b2/VA [Disease mention corpus] ● IxaMed-GS [Disease mention corpus in Spanish] ● DrugSemantics corpus [Disease mention corpus in Spanish] ● CodiEsp [Diagnosis mention corpus normalized to ICD-10 in Spanish] ● Campillos-Llanos corpus [Disease corpus partially normalized to UMLS in Spanish] ● UMLS Metathesaurus creation [Terminological resource] [1993] ● Snomed-CT creation [Terminological resource] [1999] ● MetaMap [Tool] [2001] ● NCBI-Disease Corpus [Disease mention corpus normalized to MeSH and OMIM] ● ShARe/CLEF eHealth [Disease corpus normalized to SNOMED CT] ● DNorm [Disease finding and normalization Tool] ● i2b2/VA partial normalization [Disease mention corpus normalized to SNOMED CT and RxNorm] ● DisTEMIST
  • 4. BioASQ - DisTEMIST: DISease TExt Mining Shared Task - krallinger.martin@gmail.com; antoniomiresc@gmail.com DisTEMIST/BioASQ track overview
  • 5. BioASQ - DisTEMIST: DISease TExt Mining Shared Task - krallinger.martin@gmail.com; antoniomiresc@gmail.com DisTEMIST resources Webpage: temu.bsc.es/distemist DisTEMIST corpus: doi.org/10.5281/zenodo.6408476 DisTEMIST annotation guidelines: doi.org/10.5281/zenodo.6458078 DisTEMIST Multilingual Silver Standard: doi.org/10.5281/zenodo.6408476 DisTEMIST entity normalization cross-mappings: doi.org/10.5281/zenodo.6408476 DisTEMIST gazetteer: doi.org/10.5281/zenodo.6458114 DisTEMIST Silver Standard: TBP DisTEMIST evaluation library: github.com/TeMU-BSC/distemist_evaluation_library DisTEMIST participant systems: temu.bsc.es/distemist/participant-systems/ DisTEMIST YouTube playlist: tinyurl.com/distemist-playlist
  • 6. BioASQ - DisTEMIST: DISease TExt Mining Shared Task - krallinger.martin@gmail.com; antoniomiresc@gmail.com DisTEMIST Corpus - data, format & annotation
  • 7. BioASQ - DisTEMIST: DISease TExt Mining Shared Task - krallinger.martin@gmail.com; antoniomiresc@gmail.com DisTEMIST Corpus - Overview ● Manual entity mention labels/annotations ● Manual normalization or mapping to SNOMED CT codes ● Inter-Annotator Agreement (IAA): 82.3 ● Random training and test split Most common disease mentions Manual
  • 8. BioASQ - DisTEMIST: DISease TExt Mining Shared Task - krallinger.martin@gmail.com; antoniomiresc@gmail.com DisTEMIST Multilingual Silver Standard 7 languages: • English • Italian • Portuguese • French • Romanian • Catalan • Galician
  • 9. BioASQ - DisTEMIST: DISease TExt Mining Shared Task - krallinger.martin@gmail.com; antoniomiresc@gmail.com DisTEMIST Multilingual Silver Standard Spanish Gold Standard English Silver Standard Snomed ID: 416098002 Snomed ID: 416098002 Snomed ID: 13644009 Snomed ID: 13644009 Online side-by side browser (BRAT): temu.bsc.es/mDistemist/diff.xhtml#/translations/en/train/S0004-06142005000500011-1?diff=/gold-standard/train/
  • 10. BioASQ - DisTEMIST: DISease TExt Mining Shared Task - krallinger.martin@gmail.com; antoniomiresc@gmail.com DisTEMIST participating teams ● Registrations: 159 ● DisTEMIST-entities: 9 participating teams, 19 submissions ● DisTEMIST-linking: 7 teams, 13 submissions
  • 11. BioASQ - DisTEMIST: DISease TExt Mining Shared Task - krallinger.martin@gmail.com; antoniomiresc@gmail.com DisTEMIST participant results ● MiF: micro-averaged F-score (main metric) ● MiP: micro-avg. Precision ● MiR: micro-avg. Recall https://github.com/TeMU-BSC/distemist_evaluation_library
  • 12. BioASQ - DisTEMIST: DISease TExt Mining Shared Task - krallinger.martin@gmail.com; antoniomiresc@gmail.com DisTEMIST participant results ● MiF: micro-averaged F-score (main metric) ● MiP: micro-avg. Precision ● MiR: micro-avg. Recall https://github.com/TeMU-BSC/distemist_evaluation_library
  • 13. BioASQ - DisTEMIST: DISease TExt Mining Shared Task - krallinger.martin@gmail.com; antoniomiresc@gmail.com DisTEMIST top-performing teams methods ● Domain and language specific transformers-based LM ● Simple decoding layer: linear layer ● Two step-process: candidate generation & candidate ranking ● 2 dictionaries: DisTEMIST Gazetteer & UMLS ● 2 candidate generation engines based on different technologies ● Smart candidate combination using weights and thresholds E PICUSLab L HPI-DHC Borchert et al. HPI-DHC @ BioASQ DisTEMIST: Spanish Biomedical Entity Linking with Pre-trained Transformers and Cross-lingual Candidate Retrieval Moscato et al. Biomedical Spanish Language Models for entity recognition and linking at BioASQ DisTEMIST DISTEMIST-entities DISTEMIST-linking Biomedical Spanish RoBERTa Tokenized input sentence Linear Token Classification Layer O El paciente … Sarcoma de Kaposi … O B-ENF I-ENF … I-ENF …
  • 14. BioASQ - DisTEMIST: DISease TExt Mining Shared Task - krallinger.martin@gmail.com; antoniomiresc@gmail.com DisTEMIST Spanish Silver Standard Definition: DisTEMIST participants made predictions for the test set + 2750 background set documents (oncology, cardiology, urology, nefrology, etc). Harmonize background set predictions → DisTEMIST Spanish Silver Standard corpus. Statistics: 615235 disease predictions (146396 unique) Usage: the DisTEMIST Spanish Silver Standard is useful to improve the deep learning systems (shown in table below), as well as to create a disease Knowledge Graph. System improvement using Silver Standard data! Documents Mentions Unique mentions Tokens 2,750 615,235 146,396 1,206,711
  • 15. BioASQ - DisTEMIST: DISease TExt Mining Shared Task - krallinger.martin@gmail.com; antoniomiresc@gmail.com • Increasing participation in the Spanish task of BioASQ (5 [2020], 7 [2021], 9 [2022],…) • Increasing interest in Spanish clinical NLP tasks • DisTEMIST Resources ○ DisTEMIST Corpus: Disease entity Gold Standard corpus normalised to SNOMED-CT. ○ DisTEMIST Multilingual Silver Standard Corpus: Disease entity corpora normalised to SNOMED-CT in several languages. ○ DisTEMIST Spanish Silver Standard (from participants’ predictions) • Disease Entity Recognition difficulties: ○ Complex entities (4%) ○ Heterogeneous diseases ○ Mention length: e.g. "tortuosidad de tronco celíaco y arteria hepática" (tortuosity of the celiac trunk and hepatic artery) Conclusions
  • 16. BioASQ - DisTEMIST: DISease TExt Mining Shared Task - krallinger.martin@gmail.com; antoniomiresc@gmail.com •Expand the DisTEMIST Multilingual Silver Standard to other languages •Correct the DisTEMIST Multilingual Silver Standard to generate a Gold Standard subset of each language to create high-quality benchmarks in the seven languages. •Prepare the same Gold Standard corpus for other medical entities (symptoms and medical procedures). Future directions
  • 17. BioASQ - DisTEMIST: DISease TExt Mining Shared Task - krallinger.martin@gmail.com; antoniomiresc@gmail.com ● Pablo Serrano Acknowledgements DisTEMIST Participants & DisTEMIST Scientific Committee BioASQ organisers ● Anastasios Nentidis ● Georgios Katsimpras ● Anastasia Krithara ● Georgios Paliouras CLEF organisers Funding: • Plan de Tecnologías del Lenguaje • AI4PROFHEALTH (PID2020-119266RA-I00) • BioMATDB Horizon Europe Grant Agreement No 101058779 Hospital 12 de Octubre BSC Text Mining Unit
  • 18. BioASQ - DisTEMIST: DISease TExt Mining Shared Task - krallinger.martin@gmail.com; antoniomiresc@gmail.com DisTEMIST resources Webpage: temu.bsc.es/distemist DisTEMIST corpus: doi.org/10.5281/zenodo.6408476 DisTEMIST annotation guidelines: doi.org/10.5281/zenodo.6458078 DisTEMIST Multilingual Silver Standard: doi.org/10.5281/zenodo.6408476 DisTEMIST cross-mappings: doi.org/10.5281/zenodo.6408476 DisTEMIST gazetteer: doi.org/10.5281/zenodo.6458114 DisTEMIST Silver Standard: TBP DisTEMIST evaluation library: github.com/TeMU-BSC/distemist_evaluation_library DisTEMIST participant systems: temu.bsc.es/distemist/participant-systems/ DisTEMIST YouTube playlist: tinyurl.com/distemist-playlist
  • 19. BioASQ - DisTEMIST: DISease TExt Mining Shared Task - krallinger.martin@gmail.com; antoniomiresc@gmail.com Thank you! Questions?