SlideShare a Scribd company logo
1 of 18
Named Entity Recognition, Concept Normalization and Clinical Coding:
Overview of the Cantemist Track for Cancer Text Mining in Spanish,
Corpus, Guidelines, Methods and Results
Antonio Miranda-Escalada, Eulàlia Farré, Martin Krallinger, Barcelona Supercomputing Center
José Antonio López Martín, Hospital 12 Octubre
antonio.miranda@bsc.es
Cantemist task overview at IberLEF workshop (SEPLN 2020)
temu.bsc.es/cantemist
tinyurl.com/yxdazqfm
doi.org/10.5281/zenodo.3878178
Cantemist
Cantemist Scientific Committee
➢ Ashish Tendulkar, Google Research
➢ Tristan Naumann, Microsoft Research Healthcare NExT, USA
➢ Parminder Bhatia, Amazon Health AI, USA
➢ Kirk Roberts, School of Biomedical Informatics, University of Texas Health Science Center, USA
➢ Irene Spasic, School of Computer Science & Informatics, co-Director of the Data Innovation Research Institute, Cardiff University, UK
➢ Alfonso Valencia Herrera, Barcelona Supercomputing Center (BSC-CNS), Spain
➢ Hercules Dalianis, Department of Computer and Systems Sciences, Stockholm University, Sweden
➢ Kevin Bretonnel Cohen, Colorado School of Medicine, USA; LIMSI, CNRS, Université Paris-Saclay, France
➢ Karin Verspoor, School of Computing and Information Systems, Health and Biomedical Informatics Centre, University of Melbourne,
Australia
➢ Aurélie Névéol, LIMSI-CNRS, Université Paris-Sud, France
➢ Goran Nenadic, Department of Computer Science, University of Manchester
➢ Zhiyong Lu, Deputy Director for Literature Search, National Center for Biotechnology Information (NCBI)
➢ Antonio Martinez, Head Pathology, Director National EQAS GCP, Spanish Society of Pathology, SEAP-IAP
➢ Mauro Oruezabal, Head of Medical Oncology Service, Hospital Universitario Rey Juan Carlos, Spain
➢ Carlos Luis Parra Calderón, Head of Technological Innovation, Virgen del Rocío University Hospital, Institute of Biomedicine of Seville,
Spain
Cantemist task overview at IberLEF workshop (SEPLN 2020)
Biomedical Text Mining - Cantemist:
tinyurl.com/yxdazqfm 2
Cantemist task overview at IberLEF workshop (SEPLN 2020)
Precision
medicine
3
Past medical shared tasks in Spanish
Cantemist task overview at IberLEF workshop (SEPLN 2020)
Shared task Conference Year Description Links
BARR IberEval 2017 Abbreviations in medical documents
BARR2 IberEval 2018 Abbreviations in medical documents https://temu.bsc.es/BARR2/organization.htm
l
DIANN IberEval 2018 Disability annotation on biomedical domain documents http://nlp.uned.es/diann/
eHealth-KD IberLEF 2018-20 Semantic relations in health-related sentences knowledge-learning.github.io/ehealthkd-2020/
knowledge-learning.github.io/ehealthkd-2019/
WMT19 WMT19 - ACL 2019 Biomedical Translation Task doi.org/10.5281/zenodo.3562535
statmt.org/wmt19/biomedical-translation-
task.html
MEDDOCAN IberLEF 2019 Anonymization of medical documents temu.bsc.es/meddocan/
PharmaCoNER BioNLP-EMNLP 2019 Recognition of drugs, medications and chemical substances
in medical texts
temu.bsc.es/pharmaconer/
MESINESP CLEF BioASQ 2020 Automatic indexing of medical literature summaries temu.bsc.es/mesinesp/
CodiEsp CLEF eHealth 2020 Clinical case coding temu.bsc.es/codiesp/
Cantemist IberLEF 2020 Tumor morphology named entity recognition, normalization
and coding
temu.bsc.es/cantemist
4
Current scenario
Cantemist task overview at IberLEF workshop (SEPLN 2020)
Cancer causes 1 in 6 deaths
worldwide
There are many unstructured text sources
in oncology
Scientific
literature
Patents
Clinical case
reports
Biobanks free
text metadata
Pathology
reports
Oncology
reports
5
Clinical NLP and cancer
Cantemist task overview at IberLEF workshop (SEPLN 2020)
Needs
➢ Annotated corpora
➢ Controlled terminologies
3
NLP
systems
(pioneers)
2
Use cases
➢ Create databases from information in
cancer literature
➢ Carry on population-level epidemiologic
studies
➢ Identify treatment/diagnostic gaps
➢ Precision oncology
1
6
International Classification of Diseases for Oncology
➢ Statistical Classification of tumor topography and morphology
➢ Domain-specific extension of the International Classification of Diseases (ICD)
➢ Created originally in 1976 [last update 2013 - heavily used worldwide, also in Spain]
➢ eCIE-O is the Spanish edition
➢ Lingua franca of pathologists with an extensive use within tumor registries
Cantemist task overview at IberLEF workshop (SEPLN 2020)
histology behaviour degree
Example: adenocarcinoma well differentiated
/
8140 / 3 1
Tumor/cell type
(adeno-)
Behaviour
(carcinoma)
Differentiation
(well differentiated)
7
Cantemist subtasks
Cantemist task overview at IberLEF workshop (SEPLN 2020)
➢ Teams may submit up to 5 runs for each subtask
Finding tumor morphology
mentions
Named Entity
Recognition NER subtask
● Prediction example:
“Carcinoma” (position 3332 -
3341)
Finding and normalizing tumor
morphology mentions to ICD-O
Normalization subtask
● Prediction example:
“Carcinoma” (position 3332 -
3341) - 8010/3
Returning for each of document a
ranked list codes
Clinical coding: indexing documents
ICD-O coding subtask
● Prediction example: 8010/3
8
Evaluation
Clinical case
Manual Gold Standard Evaluation
Automatic Prediction
Incorrect
micro-average
F1 = 0
character offset: 36 - 51
character offset: 36 - 63
Incorrect
micro-average
F1 = 0
character offset: 36 - 63
CIE-O code: 8720/3
character offset: 36 - 51
CIE-O code: 8720/3
Antecedente de
haber presentado un
melanoma maligno
en el muslo derecho
History of malignant
melanoma in the
right thigh
Correct
MAP = 1
CIE-O code: 8720/3 CIE-O code: 8720/3
NER
Normali-
zation
Coding
Cantemist task overview at IberLEF workshop (SEPLN 2020)
9
https://github.com/TeMU-BSC/cantemist-evaluation-library/
Generated resources
Cantemist task overview at IberLEF workshop (SEPLN 2020)
https://doi.org/10.5281
/zenodo.3878178
Gold Standard
Cantemist Corpus
● Spanish oncology clinical cases
● Annotated by clinical experts
● Currently extending it to 1,900
documents
● Brat and TSV format
https://doi.org/10.5281/zenodo.3
773228
Cantemist guidelines
● Annotating morphology neoplasms
● Mapping annotations to eCIE-O
https://doi.org/10.5281/zenodo.4
010899
Cantemist Silver
Standard
● Automatic predictions of
Cantemist participants on a corpus
of additional clinical case reports
● Documents: 1,301 (501 + 500 + 300)
● Tokens: 1,093,501 tokens
● Manual annotations: 16,030
● Unique codes: 850
Documents: 4,932
10
Gold Standard Cantemist corpus example
Brat:
NER subtask &
Normalization subtask
TSV: ICD-O coding subtask
Cantemist task overview at IberLEF workshop (SEPLN 2020)
11
Participation
Cantemist task overview at IberLEF workshop (SEPLN 2020)
Cantemist track difficulty Adaptation of previous system Participation future Cantemist
Software product/startup
Multilinguality of systems Release of systems
● 66 registrations
● 25 submissions
● 19 papers
● 121 novel systems
● 16 countries
Many teams, multilingual and
diverse
12
Results
Cantemist task overview at IberLEF workshop (SEPLN 2020)
● NER subtask: 11 teams
with F1 > 0.80
● Norm subtask: 6 teams
with F1 > 0.75
● ICD-O coding subtask:
highly competitive
results
13
Top participants runs
Generated software - team systems
Team Code link
LasigeBioTM https://github.com/lasigeBioTM/CANTEMIST-Participation
Hulat-UC3M https://github.com/ssantamaria94/CANTEMIST-Participation
ICB-UMA https://github.com/guilopgar/CANTEMIST-2020
Tong Wang https://github.com/18720936539/CANTEMIST
Kathrync https://github.com/kathrynchapman/CANTEMIST2020
Recognai https://github.com/recognai/cantemist-ner
Biomedical Text Mining -
Cantemist:
tinyurl.com/yxdazqfm
Cantemist task overview at IberLEF workshop (SEPLN 2020)
Generated NER, normalization and coding software for Spanish
temu.bsc.es/cantemist
14
Conclusions
9 research, 3 industry & 3
clinical authorities in the
committee
9+3+3
● Cross-disciplinary
community involvement
countries
registered in a shared task on
clinical coding of Spanish
documents
16
● Global community
involvement
documents
post-workshop Gold Standard
with clinical cases joining
oncology and Covid
1900
● https://doi.org/10.5281/
zenodo.3773228
of participants
that used machine learning,
reported employed deep
learning
100%
● The shift in paradigm
has settled down
● Data-hungry methods
Cantemist task overview at IberLEF workshop (SEPLN 2020)
15
Methods
Data
Participants
Scientific committee
Need of HPC for NLP
Cantemist task overview at IberLEF workshop (SEPLN 2020)
Joint efforts, synergies & collaborations: antonio.miranda@bsc.es ; martin.krallinger@gmail.com
16
Thank you!
All task participants
Cantemist organizers
● Martin Krallinger, Eulàlia Farré
IberLEF organizers
Jose Antonio Lopez-Martin (Hospital 12 de
Octubre) and the Sociedad Española de Oncología Médica
(SEOM).
Cantemist Scientific Committee
Plan de Tecnologías del Lenguaje
BITAC
● Gloria González, Toni Mas
Cantemist task overview at IberLEF workshop (SEPLN 2020)
● Kirk Roberts
● Parminder Bhatia
● Irene Spasic
● Tristan Naumann
● Carlos Luis Parra
● Ashish Tendulkar
● Antonio Martinez
● Alfonso Valencia
● Hercules Dalianis
● Kevin Bretonnel Cohen
● Karin Verspoor
● Aurélie Névéol
● Goran Nenadic
● Zhiyong Lu
● Mauro Oruezabal
antonio.miranda@bsc.es ; martin.krallinger@gmail.com
17
Antonio Miranda-Escalada, Eulàlia Farré, Martin Krallinger, Barcelona Supercomputing Center
José Antonio López Martín, Hospital 12 Octubre
antonio.miranda@bsc.es
Cantemist task overview at IberLEF workshop (SEPLN 2020)
Cantemist
temu.bsc.es/cantemist
tinyurl.com/yxdazqfm
doi.org/10.5281/zenodo.3878178
Cite: Antonio Miranda-Escalada, Eulàlia Farré and Martin Krallinger. Named Entity Recognition, Concept Normalization and Clinical Coding:
Overview of the Cantemist Track for Cancer Text Mining in Spanish, Corpus, Guidelines, Methods and Results, in: Proceedings of the Iberian
Languages Evaluation Forum (IberLEF 2020), CEUR Workshop Proceedings, 2020.

More Related Content

Similar to Named Entity Recognition, Concept Normalization and Clinical Coding: Overview of the Cantemist Track for Cancer Text Mining in Spanish, Corpus, Guidelines, Methods and Results (Cantemist task overview (Talk at IberLEF workshop of SEPLN 2020)

Biological Apps: Rapidly Converging Technologies for Living Information Proce...
Biological Apps: Rapidly Converging Technologies for Living Information Proce...Biological Apps: Rapidly Converging Technologies for Living Information Proce...
Biological Apps: Rapidly Converging Technologies for Living Information Proce...Natalio Krasnogor
 
2016 ACS Semantic Approaches for Biochemical Knowledge Discovery
2016 ACS Semantic Approaches for Biochemical Knowledge Discovery2016 ACS Semantic Approaches for Biochemical Knowledge Discovery
2016 ACS Semantic Approaches for Biochemical Knowledge DiscoveryMichel Dumontier
 
The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects Carole Goble
 
Nicola Ancona – Dall’Intelligenza Artificiale alla Systems Medicine
Nicola Ancona – Dall’Intelligenza Artificiale alla Systems MedicineNicola Ancona – Dall’Intelligenza Artificiale alla Systems Medicine
Nicola Ancona – Dall’Intelligenza Artificiale alla Systems Medicineeventi-ITBbari
 
Nanopore sequencing 2019 patent landscape flyer
Nanopore sequencing 2019 patent landscape flyerNanopore sequencing 2019 patent landscape flyer
Nanopore sequencing 2019 patent landscape flyerKnowmade
 
Big Data and AI in Fighting Against COVID-19
Big Data and AI in Fighting Against COVID-19Big Data and AI in Fighting Against COVID-19
Big Data and AI in Fighting Against COVID-19Bill Liu
 
Big Data and AI for Covid-19
Big Data and AI for Covid-19Big Data and AI for Covid-19
Big Data and AI for Covid-19Andrew Zhang
 
Link - Opportunities and Challenges for Research on Intelligent Algorithms fo...
Link - Opportunities and Challenges for Research on Intelligent Algorithms fo...Link - Opportunities and Challenges for Research on Intelligent Algorithms fo...
Link - Opportunities and Challenges for Research on Intelligent Algorithms fo...Universitat Politècnica de València
 
In silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug DevelopmentIn silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug DevelopmentChris Southan
 
Carbon valley teaser for skolkovo vc 5.09.2013 v3
Carbon valley teaser for skolkovo vc  5.09.2013 v3Carbon valley teaser for skolkovo vc  5.09.2013 v3
Carbon valley teaser for skolkovo vc 5.09.2013 v3Yan Valle
 
Heterogeneous Data Aggregation and Querying at Web Scale Using Semantic align...
Heterogeneous Data Aggregation and Querying at Web Scale Using Semantic align...Heterogeneous Data Aggregation and Querying at Web Scale Using Semantic align...
Heterogeneous Data Aggregation and Querying at Web Scale Using Semantic align...Franck Michel
 
Using patstat in universities evaluation procedures
Using patstat in universities evaluation procedures Using patstat in universities evaluation procedures
Using patstat in universities evaluation procedures Gianluca Tarasconi
 
2016 Standardization of Laboratory Test Coding - PHI Conference
2016 Standardization of Laboratory Test Coding - PHI Conference2016 Standardization of Laboratory Test Coding - PHI Conference
2016 Standardization of Laboratory Test Coding - PHI ConferenceMegan Sawchuk
 
AI-SDV 2020: Combining Knowledge and Machine Learning for the Analysis of Sci...
AI-SDV 2020: Combining Knowledge and Machine Learning for the Analysis of Sci...AI-SDV 2020: Combining Knowledge and Machine Learning for the Analysis of Sci...
AI-SDV 2020: Combining Knowledge and Machine Learning for the Analysis of Sci...Dr. Haxel Consult
 
SNOMED CT concept model for molecular pathology_final.pptx
SNOMED CT concept model for molecular pathology_final.pptxSNOMED CT concept model for molecular pathology_final.pptx
SNOMED CT concept model for molecular pathology_final.pptxHariHaran685388
 
Enhancing COVID-19 forecasting through deep learning techniques and fine-tuning
Enhancing COVID-19 forecasting through deep learning techniques and fine-tuningEnhancing COVID-19 forecasting through deep learning techniques and fine-tuning
Enhancing COVID-19 forecasting through deep learning techniques and fine-tuningIJECEIAES
 
El nuevo superordenador Mare Nostrum y el futuro procesador europeo
El nuevo superordenador Mare Nostrum y el futuro procesador europeoEl nuevo superordenador Mare Nostrum y el futuro procesador europeo
El nuevo superordenador Mare Nostrum y el futuro procesador europeoAMETIC
 

Similar to Named Entity Recognition, Concept Normalization and Clinical Coding: Overview of the Cantemist Track for Cancer Text Mining in Spanish, Corpus, Guidelines, Methods and Results (Cantemist task overview (Talk at IberLEF workshop of SEPLN 2020) (20)

Role of computers
Role of computersRole of computers
Role of computers
 
Biological Apps: Rapidly Converging Technologies for Living Information Proce...
Biological Apps: Rapidly Converging Technologies for Living Information Proce...Biological Apps: Rapidly Converging Technologies for Living Information Proce...
Biological Apps: Rapidly Converging Technologies for Living Information Proce...
 
2016 ACS Semantic Approaches for Biochemical Knowledge Discovery
2016 ACS Semantic Approaches for Biochemical Knowledge Discovery2016 ACS Semantic Approaches for Biochemical Knowledge Discovery
2016 ACS Semantic Approaches for Biochemical Knowledge Discovery
 
The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects
 
Nicola Ancona – Dall’Intelligenza Artificiale alla Systems Medicine
Nicola Ancona – Dall’Intelligenza Artificiale alla Systems MedicineNicola Ancona – Dall’Intelligenza Artificiale alla Systems Medicine
Nicola Ancona – Dall’Intelligenza Artificiale alla Systems Medicine
 
Nanopore sequencing 2019 patent landscape flyer
Nanopore sequencing 2019 patent landscape flyerNanopore sequencing 2019 patent landscape flyer
Nanopore sequencing 2019 patent landscape flyer
 
Big Data and AI in Fighting Against COVID-19
Big Data and AI in Fighting Against COVID-19Big Data and AI in Fighting Against COVID-19
Big Data and AI in Fighting Against COVID-19
 
Big Data and AI for Covid-19
Big Data and AI for Covid-19Big Data and AI for Covid-19
Big Data and AI for Covid-19
 
Link - Opportunities and Challenges for Research on Intelligent Algorithms fo...
Link - Opportunities and Challenges for Research on Intelligent Algorithms fo...Link - Opportunities and Challenges for Research on Intelligent Algorithms fo...
Link - Opportunities and Challenges for Research on Intelligent Algorithms fo...
 
In silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug DevelopmentIn silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug Development
 
Carbon valley teaser for skolkovo vc 5.09.2013 v3
Carbon valley teaser for skolkovo vc  5.09.2013 v3Carbon valley teaser for skolkovo vc  5.09.2013 v3
Carbon valley teaser for skolkovo vc 5.09.2013 v3
 
Heterogeneous Data Aggregation and Querying at Web Scale Using Semantic align...
Heterogeneous Data Aggregation and Querying at Web Scale Using Semantic align...Heterogeneous Data Aggregation and Querying at Web Scale Using Semantic align...
Heterogeneous Data Aggregation and Querying at Web Scale Using Semantic align...
 
Using patstat in universities evaluation procedures
Using patstat in universities evaluation procedures Using patstat in universities evaluation procedures
Using patstat in universities evaluation procedures
 
2016 Standardization of Laboratory Test Coding - PHI Conference
2016 Standardization of Laboratory Test Coding - PHI Conference2016 Standardization of Laboratory Test Coding - PHI Conference
2016 Standardization of Laboratory Test Coding - PHI Conference
 
AI-SDV 2020: Combining Knowledge and Machine Learning for the Analysis of Sci...
AI-SDV 2020: Combining Knowledge and Machine Learning for the Analysis of Sci...AI-SDV 2020: Combining Knowledge and Machine Learning for the Analysis of Sci...
AI-SDV 2020: Combining Knowledge and Machine Learning for the Analysis of Sci...
 
Demo Presentation Wageningen Text Mining Workshop 2007
Demo Presentation Wageningen Text Mining Workshop 2007Demo Presentation Wageningen Text Mining Workshop 2007
Demo Presentation Wageningen Text Mining Workshop 2007
 
SNOMED CT concept model for molecular pathology_final.pptx
SNOMED CT concept model for molecular pathology_final.pptxSNOMED CT concept model for molecular pathology_final.pptx
SNOMED CT concept model for molecular pathology_final.pptx
 
Enhancing COVID-19 forecasting through deep learning techniques and fine-tuning
Enhancing COVID-19 forecasting through deep learning techniques and fine-tuningEnhancing COVID-19 forecasting through deep learning techniques and fine-tuning
Enhancing COVID-19 forecasting through deep learning techniques and fine-tuning
 
2016 bmdid-mappings
2016 bmdid-mappings2016 bmdid-mappings
2016 bmdid-mappings
 
El nuevo superordenador Mare Nostrum y el futuro procesador europeo
El nuevo superordenador Mare Nostrum y el futuro procesador europeoEl nuevo superordenador Mare Nostrum y el futuro procesador europeo
El nuevo superordenador Mare Nostrum y el futuro procesador europeo
 

More from Martin Krallinger

MedProcNER/ProcTEMIST Shared Task on Clinical Procedure Detection and Normali...
MedProcNER/ProcTEMIST Shared Task on Clinical Procedure Detection and Normali...MedProcNER/ProcTEMIST Shared Task on Clinical Procedure Detection and Normali...
MedProcNER/ProcTEMIST Shared Task on Clinical Procedure Detection and Normali...Martin Krallinger
 
SympTEMIST Shared Task on Symptoms, Signs and Findings Detection and Normaliz...
SympTEMIST Shared Task on Symptoms, Signs and Findings Detection and Normaliz...SympTEMIST Shared Task on Symptoms, Signs and Findings Detection and Normaliz...
SympTEMIST Shared Task on Symptoms, Signs and Findings Detection and Normaliz...Martin Krallinger
 
Overview of DisTEMIST at BioASQ: Automatic detection and normalization of dis...
Overview of DisTEMIST at BioASQ: Automatic detection and normalization of dis...Overview of DisTEMIST at BioASQ: Automatic detection and normalization of dis...
Overview of DisTEMIST at BioASQ: Automatic detection and normalization of dis...Martin Krallinger
 
Mention detection, normalization & classification of species, pathogens, huma...
Mention detection, normalization & classification of species, pathogens, huma...Mention detection, normalization & classification of species, pathogens, huma...
Mention detection, normalization & classification of species, pathogens, huma...Martin Krallinger
 
PharmaCoNER: Pharmacological Substances, Compounds and proteins Named Entity ...
PharmaCoNER: Pharmacological Substances, Compounds and proteins Named Entity ...PharmaCoNER: Pharmacological Substances, Compounds and proteins Named Entity ...
PharmaCoNER: Pharmacological Substances, Compounds and proteins Named Entity ...Martin Krallinger
 

More from Martin Krallinger (6)

MedProcNER/ProcTEMIST Shared Task on Clinical Procedure Detection and Normali...
MedProcNER/ProcTEMIST Shared Task on Clinical Procedure Detection and Normali...MedProcNER/ProcTEMIST Shared Task on Clinical Procedure Detection and Normali...
MedProcNER/ProcTEMIST Shared Task on Clinical Procedure Detection and Normali...
 
SympTEMIST Shared Task on Symptoms, Signs and Findings Detection and Normaliz...
SympTEMIST Shared Task on Symptoms, Signs and Findings Detection and Normaliz...SympTEMIST Shared Task on Symptoms, Signs and Findings Detection and Normaliz...
SympTEMIST Shared Task on Symptoms, Signs and Findings Detection and Normaliz...
 
Overview of DisTEMIST at BioASQ: Automatic detection and normalization of dis...
Overview of DisTEMIST at BioASQ: Automatic detection and normalization of dis...Overview of DisTEMIST at BioASQ: Automatic detection and normalization of dis...
Overview of DisTEMIST at BioASQ: Automatic detection and normalization of dis...
 
Mention detection, normalization & classification of species, pathogens, huma...
Mention detection, normalization & classification of species, pathogens, huma...Mention detection, normalization & classification of species, pathogens, huma...
Mention detection, normalization & classification of species, pathogens, huma...
 
PharmaCoNER: Pharmacological Substances, Compounds and proteins Named Entity ...
PharmaCoNER: Pharmacological Substances, Compounds and proteins Named Entity ...PharmaCoNER: Pharmacological Substances, Compounds and proteins Named Entity ...
PharmaCoNER: Pharmacological Substances, Compounds and proteins Named Entity ...
 
Terminologia krallinger
Terminologia krallingerTerminologia krallinger
Terminologia krallinger
 

Recently uploaded

Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 

Recently uploaded (20)

Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 

Named Entity Recognition, Concept Normalization and Clinical Coding: Overview of the Cantemist Track for Cancer Text Mining in Spanish, Corpus, Guidelines, Methods and Results (Cantemist task overview (Talk at IberLEF workshop of SEPLN 2020)

  • 1. Named Entity Recognition, Concept Normalization and Clinical Coding: Overview of the Cantemist Track for Cancer Text Mining in Spanish, Corpus, Guidelines, Methods and Results Antonio Miranda-Escalada, Eulàlia Farré, Martin Krallinger, Barcelona Supercomputing Center José Antonio López Martín, Hospital 12 Octubre antonio.miranda@bsc.es Cantemist task overview at IberLEF workshop (SEPLN 2020) temu.bsc.es/cantemist tinyurl.com/yxdazqfm doi.org/10.5281/zenodo.3878178 Cantemist
  • 2. Cantemist Scientific Committee ➢ Ashish Tendulkar, Google Research ➢ Tristan Naumann, Microsoft Research Healthcare NExT, USA ➢ Parminder Bhatia, Amazon Health AI, USA ➢ Kirk Roberts, School of Biomedical Informatics, University of Texas Health Science Center, USA ➢ Irene Spasic, School of Computer Science & Informatics, co-Director of the Data Innovation Research Institute, Cardiff University, UK ➢ Alfonso Valencia Herrera, Barcelona Supercomputing Center (BSC-CNS), Spain ➢ Hercules Dalianis, Department of Computer and Systems Sciences, Stockholm University, Sweden ➢ Kevin Bretonnel Cohen, Colorado School of Medicine, USA; LIMSI, CNRS, Université Paris-Saclay, France ➢ Karin Verspoor, School of Computing and Information Systems, Health and Biomedical Informatics Centre, University of Melbourne, Australia ➢ Aurélie Névéol, LIMSI-CNRS, Université Paris-Sud, France ➢ Goran Nenadic, Department of Computer Science, University of Manchester ➢ Zhiyong Lu, Deputy Director for Literature Search, National Center for Biotechnology Information (NCBI) ➢ Antonio Martinez, Head Pathology, Director National EQAS GCP, Spanish Society of Pathology, SEAP-IAP ➢ Mauro Oruezabal, Head of Medical Oncology Service, Hospital Universitario Rey Juan Carlos, Spain ➢ Carlos Luis Parra Calderón, Head of Technological Innovation, Virgen del Rocío University Hospital, Institute of Biomedicine of Seville, Spain Cantemist task overview at IberLEF workshop (SEPLN 2020) Biomedical Text Mining - Cantemist: tinyurl.com/yxdazqfm 2
  • 3. Cantemist task overview at IberLEF workshop (SEPLN 2020) Precision medicine 3
  • 4. Past medical shared tasks in Spanish Cantemist task overview at IberLEF workshop (SEPLN 2020) Shared task Conference Year Description Links BARR IberEval 2017 Abbreviations in medical documents BARR2 IberEval 2018 Abbreviations in medical documents https://temu.bsc.es/BARR2/organization.htm l DIANN IberEval 2018 Disability annotation on biomedical domain documents http://nlp.uned.es/diann/ eHealth-KD IberLEF 2018-20 Semantic relations in health-related sentences knowledge-learning.github.io/ehealthkd-2020/ knowledge-learning.github.io/ehealthkd-2019/ WMT19 WMT19 - ACL 2019 Biomedical Translation Task doi.org/10.5281/zenodo.3562535 statmt.org/wmt19/biomedical-translation- task.html MEDDOCAN IberLEF 2019 Anonymization of medical documents temu.bsc.es/meddocan/ PharmaCoNER BioNLP-EMNLP 2019 Recognition of drugs, medications and chemical substances in medical texts temu.bsc.es/pharmaconer/ MESINESP CLEF BioASQ 2020 Automatic indexing of medical literature summaries temu.bsc.es/mesinesp/ CodiEsp CLEF eHealth 2020 Clinical case coding temu.bsc.es/codiesp/ Cantemist IberLEF 2020 Tumor morphology named entity recognition, normalization and coding temu.bsc.es/cantemist 4
  • 5. Current scenario Cantemist task overview at IberLEF workshop (SEPLN 2020) Cancer causes 1 in 6 deaths worldwide There are many unstructured text sources in oncology Scientific literature Patents Clinical case reports Biobanks free text metadata Pathology reports Oncology reports 5
  • 6. Clinical NLP and cancer Cantemist task overview at IberLEF workshop (SEPLN 2020) Needs ➢ Annotated corpora ➢ Controlled terminologies 3 NLP systems (pioneers) 2 Use cases ➢ Create databases from information in cancer literature ➢ Carry on population-level epidemiologic studies ➢ Identify treatment/diagnostic gaps ➢ Precision oncology 1 6
  • 7. International Classification of Diseases for Oncology ➢ Statistical Classification of tumor topography and morphology ➢ Domain-specific extension of the International Classification of Diseases (ICD) ➢ Created originally in 1976 [last update 2013 - heavily used worldwide, also in Spain] ➢ eCIE-O is the Spanish edition ➢ Lingua franca of pathologists with an extensive use within tumor registries Cantemist task overview at IberLEF workshop (SEPLN 2020) histology behaviour degree Example: adenocarcinoma well differentiated / 8140 / 3 1 Tumor/cell type (adeno-) Behaviour (carcinoma) Differentiation (well differentiated) 7
  • 8. Cantemist subtasks Cantemist task overview at IberLEF workshop (SEPLN 2020) ➢ Teams may submit up to 5 runs for each subtask Finding tumor morphology mentions Named Entity Recognition NER subtask ● Prediction example: “Carcinoma” (position 3332 - 3341) Finding and normalizing tumor morphology mentions to ICD-O Normalization subtask ● Prediction example: “Carcinoma” (position 3332 - 3341) - 8010/3 Returning for each of document a ranked list codes Clinical coding: indexing documents ICD-O coding subtask ● Prediction example: 8010/3 8
  • 9. Evaluation Clinical case Manual Gold Standard Evaluation Automatic Prediction Incorrect micro-average F1 = 0 character offset: 36 - 51 character offset: 36 - 63 Incorrect micro-average F1 = 0 character offset: 36 - 63 CIE-O code: 8720/3 character offset: 36 - 51 CIE-O code: 8720/3 Antecedente de haber presentado un melanoma maligno en el muslo derecho History of malignant melanoma in the right thigh Correct MAP = 1 CIE-O code: 8720/3 CIE-O code: 8720/3 NER Normali- zation Coding Cantemist task overview at IberLEF workshop (SEPLN 2020) 9 https://github.com/TeMU-BSC/cantemist-evaluation-library/
  • 10. Generated resources Cantemist task overview at IberLEF workshop (SEPLN 2020) https://doi.org/10.5281 /zenodo.3878178 Gold Standard Cantemist Corpus ● Spanish oncology clinical cases ● Annotated by clinical experts ● Currently extending it to 1,900 documents ● Brat and TSV format https://doi.org/10.5281/zenodo.3 773228 Cantemist guidelines ● Annotating morphology neoplasms ● Mapping annotations to eCIE-O https://doi.org/10.5281/zenodo.4 010899 Cantemist Silver Standard ● Automatic predictions of Cantemist participants on a corpus of additional clinical case reports ● Documents: 1,301 (501 + 500 + 300) ● Tokens: 1,093,501 tokens ● Manual annotations: 16,030 ● Unique codes: 850 Documents: 4,932 10
  • 11. Gold Standard Cantemist corpus example Brat: NER subtask & Normalization subtask TSV: ICD-O coding subtask Cantemist task overview at IberLEF workshop (SEPLN 2020) 11
  • 12. Participation Cantemist task overview at IberLEF workshop (SEPLN 2020) Cantemist track difficulty Adaptation of previous system Participation future Cantemist Software product/startup Multilinguality of systems Release of systems ● 66 registrations ● 25 submissions ● 19 papers ● 121 novel systems ● 16 countries Many teams, multilingual and diverse 12
  • 13. Results Cantemist task overview at IberLEF workshop (SEPLN 2020) ● NER subtask: 11 teams with F1 > 0.80 ● Norm subtask: 6 teams with F1 > 0.75 ● ICD-O coding subtask: highly competitive results 13 Top participants runs
  • 14. Generated software - team systems Team Code link LasigeBioTM https://github.com/lasigeBioTM/CANTEMIST-Participation Hulat-UC3M https://github.com/ssantamaria94/CANTEMIST-Participation ICB-UMA https://github.com/guilopgar/CANTEMIST-2020 Tong Wang https://github.com/18720936539/CANTEMIST Kathrync https://github.com/kathrynchapman/CANTEMIST2020 Recognai https://github.com/recognai/cantemist-ner Biomedical Text Mining - Cantemist: tinyurl.com/yxdazqfm Cantemist task overview at IberLEF workshop (SEPLN 2020) Generated NER, normalization and coding software for Spanish temu.bsc.es/cantemist 14
  • 15. Conclusions 9 research, 3 industry & 3 clinical authorities in the committee 9+3+3 ● Cross-disciplinary community involvement countries registered in a shared task on clinical coding of Spanish documents 16 ● Global community involvement documents post-workshop Gold Standard with clinical cases joining oncology and Covid 1900 ● https://doi.org/10.5281/ zenodo.3773228 of participants that used machine learning, reported employed deep learning 100% ● The shift in paradigm has settled down ● Data-hungry methods Cantemist task overview at IberLEF workshop (SEPLN 2020) 15 Methods Data Participants Scientific committee
  • 16. Need of HPC for NLP Cantemist task overview at IberLEF workshop (SEPLN 2020) Joint efforts, synergies & collaborations: antonio.miranda@bsc.es ; martin.krallinger@gmail.com 16
  • 17. Thank you! All task participants Cantemist organizers ● Martin Krallinger, Eulàlia Farré IberLEF organizers Jose Antonio Lopez-Martin (Hospital 12 de Octubre) and the Sociedad Española de Oncología Médica (SEOM). Cantemist Scientific Committee Plan de Tecnologías del Lenguaje BITAC ● Gloria González, Toni Mas Cantemist task overview at IberLEF workshop (SEPLN 2020) ● Kirk Roberts ● Parminder Bhatia ● Irene Spasic ● Tristan Naumann ● Carlos Luis Parra ● Ashish Tendulkar ● Antonio Martinez ● Alfonso Valencia ● Hercules Dalianis ● Kevin Bretonnel Cohen ● Karin Verspoor ● Aurélie Névéol ● Goran Nenadic ● Zhiyong Lu ● Mauro Oruezabal antonio.miranda@bsc.es ; martin.krallinger@gmail.com 17
  • 18. Antonio Miranda-Escalada, Eulàlia Farré, Martin Krallinger, Barcelona Supercomputing Center José Antonio López Martín, Hospital 12 Octubre antonio.miranda@bsc.es Cantemist task overview at IberLEF workshop (SEPLN 2020) Cantemist temu.bsc.es/cantemist tinyurl.com/yxdazqfm doi.org/10.5281/zenodo.3878178 Cite: Antonio Miranda-Escalada, Eulàlia Farré and Martin Krallinger. Named Entity Recognition, Concept Normalization and Clinical Coding: Overview of the Cantemist Track for Cancer Text Mining in Spanish, Corpus, Guidelines, Methods and Results, in: Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2020), CEUR Workshop Proceedings, 2020.

Editor's Notes

  1. welcome to session, last session of IberLEF, task organized by BSC, meaning me, Eulalia and Martin, together with clinician from H12O, José Antonio López Martín
  2. We had a consider number of participants, but we had only time for 6 talks. Then we are collecting the rest of participants talks on the YouTube playlist first before starting thank to scientific committee they are experts from industry, academia, hospitalary etc helped us in defining, evaluation proceedings, etc
  3. Massive volume of clinical data Most of it is unstructured We want to use them to solve clinical problems We want interoperability COVID example: need of efficient search, retrieval, analysis, integration as well as exploitation strategies for a diversity of medical content types que si estructuras ese 80% lo metes en el pull de big data para encontrar patrones, usarlo para precision medicine Spanish spoken by > 572 million people (with 477 na@ve speakers) Large healthcare professional community communicating in Spanish (incl. practicing physicians and nursing, midwifery or pharmaceutical personnel) Ac@ve produc@on of medical publica8ons in Spanish worldwide hosted in several databases like PubMed, IBECS, SCIELO, LILACS, MEDES,... Also other health-related content in Spanish: Clinical trial databses (e.g REEC), na@onal health projects (ISCIII-FIS), patents, clinical prac@ce guidelines, social media, There is a lot of unstructured data in oncology and pathology
  4. spain one of the leading oncology research countries
  5. one of the general conclusions is the need of high quality manually annotated corpora para fomentar la evaluación y el desarrollo de sistemas Terminar con: necesidad de estructurar datos clínicos oncológicos, para hacerlo bien, necesidad de recursos anotados. Para hacerlo aún mejor, debemos tener resultados interoperables, así que necesitamos terminologías cite: https://pubmed.ncbi.nlm.nih.gov/19135551/ https://www.sciencedirect.com/science/article/pii/S1532046412001712 https://pubmed.ncbi.nlm.nih.gov/22822041/
  6. The World Health Organization maintains the International Classification of Diseases for Oncology, or ICD-O. It is key to retrieve structured information from clinical texts in oncology Some tumor mentions contain a relevant modifier not included in the terminology for this concept. Then, we append /H to the code. For example, in the file cc_onco158, we have the codes 8000/1 and 8000/1/H. 8000/1 corresponds to a mention of neoplasm (“neoplasia”, in Spanish). In the 8000/1/H case, the mention is (in Spanish) “neoplasia de estirpe epitelial”. The modifier “estirpe epitelial” is present in the ICD-O terminology for many tumors. However, it is not present to modify specifically the code 8000/1. Then, we consider it a relevant modifier and add the /H.
  7. It is a manually generated corpus with tumor mentions labelled by clinical experts following guidelines. So the process is reproducible and has quality control. This corpus contains clinical case reports only in Spanish language. It has 1300 annotated documents. Clinical experts have found mentions of tumor morphologies present in the ICD-O terminology in these 1300 documents.
  8. despite pandemic, submission period over summer increasing interest on NLP over health 66 registrations 25 teams 19 proceedings papers 121 novel runs dont say anythin about the graphs -> no time
  9. cross-disciplinary: industria de ML, etc global post-workshop gold standard: we are going to extender el corpus. Que va a incluir casos clínicos que unan oncología y covid -> y enlace deep learning BSC Cantemist tagger -> haremos un release de nuestro propio tagger Cantemist involved finding tumor histology mentions in clinical case reports. If focused on Spanish documents and introduced a new terminology in NLP shared tasks, ICD-O. Solving a quite domain-specific task in a language other than English, using a terminology not that known in NLP, could seem too minoritary. However, Cantemist has proven us wrong. We had a strong support from the community. Starting from the scientific committee, which united authorities from the top Spanish hospitals, industry leaders and renowned researchers. Participants have come from countries all around the globe and 20% of them came from industry. Considering the complexity of clinical texts and the scarcity of clinical-specific resources, the results were quite high. These results are obtained using Deep Learning. In particular, most successful approaches were (1) combining the latest word embeddings with LSTMs and CRFs and (2) incorporating language models (BERT-like). These approaches are data-hungry. Then, if we want to successfully apply our methods to the clinical world, we need to continue generating datasets and resources in Spanish and with the characteristics of clinical texts. Indeed, when we explored the difficult annotations, we observed that they corresponded to highly specific mentions, complex mentions that are not even included in the terminology, mentions not frequent in the training data, etc. We need to continue generating these resources. In the subset of missed annotations, 8% of the codes contain an “H”. This percentage is as low as 2% in the entire test set. 13.2% of the missed annotations include the sixth differentiation digit in their code. In contrast, this percentage is 5.6% in the entire test set The median of appearances of the missed codes in the training and development set is 1, whereas for the test set codes is 3 Finally, 20.8% of the missed annotations have the metastasis code (8000/6), while this code accounts for 34.6% of the complete test set
  10. Conclusions -> queremos que la gente colabore de cara a proyectos futuros -> meter algo del uso de computational intensive AI approaches (deep learning) > muchos systems successful usan methodology DL que necesita alto cómputo and one of the one of the conclus we can see is that success method rely on compt intensive ai approaches. One of the future ... is to provide HPC support for participants and we are open in exploring syngergies and collaborations for supporting HPC resources take home message _> successful methods se beneficiarían del uso de HPC -> estamos open para collaboration -> foto del MN as we are part of bsc and spanish supercomputing network we are very open to collaborate