SlideShare a Scribd company logo
1 of 21
Barcelona Supercomputing Center (BSC):
• Antonio Miranda-Escalada
• Luis Gascó
• Salvador Lima-López
• Eulàlia Farré-Maduell
• Darryl Estrada
• Martin Krallinger
Mention detection, normalization &
classification of species, pathogens,
humans and food in clinical
documents: Overview of the
LivingNER shared task and resources
Martin Krallinger
Head of Text Mining Unit, BSC
<mkrallin@bsc.es>
IberLEF @ SEPLN 2022 LivingNER corpus: doi.org/10.5281/zenodo.6376662
1
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com
Importance of species information extraction
De − Allice Hunter - File:Hispanophone global world map language.png,CC BY-SA 4.0,
https://commons.wikimedia.org/w/index.php?curid=69323596
National Center for
Biotechnology Information
(NCBI) Taxonomy
How many species inhabit the earth
How many species do we know
Quantification of global species
richness
Taxonomic classification of species
Number of species in a taxonomic
group
Validation against well-known taxa
250 years of taxonomic classification
1.2 million species catalogued in a
central database
86% of species on Earth and 91% of
species in the ocean still await
description
Knowledge gap
-Large collection of species, change over time, hierarchical relation types relation
types
-Homonymy with commonly used words, e.g.: “Spot” (Leiostomus xanthurus) and
“Permit” (Trachinotus falcatus)
-Homonymy with other medical entities (the word “goat” can refer to proteins
found in human, zebrafish, rat and mouse.
-Abbreviations are ambiguous, e.g.: HBV can be used for both “Hepatitis B virus”
as well as “Hepatitis B vaccine”
-Vernacular form (common names)
- Incorrect case or misspelt (like, Bacterium coli, Bacillus coli and Escheria coli for
Escherichia coli)
- Coordinations, nested expressions: “human immunodeficiency viruses types 1
and 2”, refer to two distinct species names, “HIV type 1” and “HIV type 2”
- Role names (e.g. athletes, responders)
- Human mencions in the form of family members, etc….
Challenges
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com
Previous SPECIES extraction and normalization efforts
● LivingNER
< 2000 2000-2010 2010-2021 2022
● The Catalogue of Life [Index of
the world's species] [Bánki et al.,
2022] [2001]
●Infectious Diseases (ID) task of BioNLP [Corpus and
shared task] [Pyysalo et al., 2011] [2011]
● SPECIES [Species mention and normalisation to NCBI
taxonomy corpus and tool] [Pafilis et al., 2013] [2014]
● ITIS (Integrated Taxonomic Information
System) [Federal effort to provide consistent
biological taxonomies] [1996]
● NCBI taxonomy [Terminological resource]
[Federhen, 2012] [1997]
● Global Names Architecture database [organizes
and cross-links electronic information about
organisms] [Pyle et al., 2016] [2016]
● LINNAEUS [Species mention and
normalisation to NCBI taxonomy corpus
and tool] [Gerner et al., 2010] [2010]
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com
LivingNER overview
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com
LivingNER resources
LivingNER corpus: doi.org/10.5281/zenodo.6376662
LivingNER annotation guidelines: doi.org/10.5281/zenodo.6385162
LivingNER Multilingual Silver Standard: doi.org/10.5281/zenodo.6376662
LivingNER terminology: doi.org/10.5281/zenodo.6390506
LivingNER Silver Standard:
LivingNER evaluation library:
github.com/tonifuc3m/livingner-evaluation-library
LivingNER participant systems:
temu.bsc.es/livingner/participant-systems/
LivingNER YouTube playlist:
https://www.youtube.com/channel/UCDsmS1pCCO8TW312wJq8aCQ/playlists
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com
LivingNER Corpus: documents, format and annotation
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com
LivingNER Corpus - Overview
● Diversity: Atención primaria, dermatología, medicina interna, medicina tropical,
endocrinología, neurología, oftalmología, psiquiatría, radiología, urgencias, cardiología,
pediatrita, oncología, odontología,..
● Manual entity annotations, NCBI taxonomy mapping and application classification
● Inter-Annotator Agreement (IAA): 94.2
● Random training, validation and test split Most common SPECIES mentions
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com
DisTEMIST Multilingual Silver Standard
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com
DisTEMIST Multilingual Silver Standard
Spanish Gold Standard English Silver Standard
Online visualiser:
https://temu.bsc.es/mLivingNER/diff.xhtml#/translations/en/annotation_transfer/train/caso_clinico_radiologia942?dif
f=/gold-standard/train/
NCBI Tax
ID: 11103
NCBI Tax
ID: 11103
NCBI Tax
ID: 1311
NCBI Tax
ID: 1311
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com
LivingNER participating teams
● Registrations: 56
● SPECIES NER track: 20
participating teams, 41
submissions
● SPECIES Norm track: 8
teams, 14 submissions
● Clinical Impact track:
5 teams, 6 submissions
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com
LivingNER participant results
● MiF: micro-averaged F-score (main metric)
● MiP: micro-avg. Precision
● MiR: micro-avg. Recall
github.com/tonifuc3m/livingner-evaluation-library
SPECIES NER SPECIES Norm
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com
LivingNER participant results
● MiF: micro-averaged F-score (main metric)
● MiP: micro-avg. Precision
● MiR: micro-avg. Recall
github.com/tonifuc3m/livingner-evaluation-library
SPECIES NER SPECIES Norm
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com
LivingNER participant results - Clinical Impact track
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com
• Increasing interest in Spanish clinical NLP tasks
• LivingNER Resources
○ LivingNER Corpus: Species entity Gold Standard corpus mapped to NCBI Taxonomy.
○ LivingNER Multilingual Silver Standard Corpus: Disease entity corpora normalised to
NCBI Taxonomy in several languages.
○ LivingNER Spanish Silver Standard (from participants’ predictions)
Conclusions
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com
• Correct the LivingNER Multilingual Silver Standard to generate a Gold Standard subset
of each language to create high-quality benchmarks in the seven languages.
• Clinical Impact track lacked enough training and test data, and we plan to correct this
issue in the future.
Future directions
● Generate more granular annotations
for the HUMAN mentions that are
needed for real-world applications.
Actual examples of annotated species mentions and automatically
recognized profession mentions.
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com
Acknowledgements
LivingNER Participants &
LivingNER Scientific Committee
IberLEF organisers
● Manuel
● Julio
● and all others
SEPLN organisers
Funding:
• Plan de Tecnologías del Lenguaje
• AI4PROFHEALTH (PID2020-119266RA-I00)
• BioMATDB Horizon Europe Grant
Agreement No 101058779
BSC Text Mining Unit
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com
LivingNER resources
LivingNER corpus: doi.org/10.5281/zenodo.6376662
LivingNER annotation guidelines: doi.org/10.5281/zenodo.6385162
LivingNER Multilingual Silver Standard: doi.org/10.5281/zenodo.6376662
LivingNER terminology: doi.org/10.5281/zenodo.6390506
LivingNER Silver Standard:
LivingNER evaluation library:
github.com/tonifuc3m/livingner-evaluation-library
LivingNER participant systems:
temu.bsc.es/livingner/participant-systems/
LivingNER YouTube playlist:
https://youtube.com/playlist?list=PL5uSCzf1azhA_gMLC3DBZe6NvmMJiggTg
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com
Questions?
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com

More Related Content

Similar to Mention detection, normalization & classification of species, pathogens, humans and food in clinical documents: Overview of the LivingNER shared task and resources (talk at IberLEF @ SEPLN 2022)

SympTEMIST Shared Task on Symptoms, Signs and Findings Detection and Normaliz...
SympTEMIST Shared Task on Symptoms, Signs and Findings Detection and Normaliz...SympTEMIST Shared Task on Symptoms, Signs and Findings Detection and Normaliz...
SympTEMIST Shared Task on Symptoms, Signs and Findings Detection and Normaliz...Martin Krallinger
 
Mansfield CV 2016 LinkedIN
Mansfield CV 2016 LinkedINMansfield CV 2016 LinkedIN
Mansfield CV 2016 LinkedINColin MANSFIELD
 
dkNET Poster Experimental Biology 2019
dkNET Poster Experimental Biology 2019dkNET Poster Experimental Biology 2019
dkNET Poster Experimental Biology 2019dkNET
 
Utility and Added Value of Classifications in Health Information Systems
Utility and Added Value of Classifications in Health Information SystemsUtility and Added Value of Classifications in Health Information Systems
Utility and Added Value of Classifications in Health Information SystemsBedirhan Ustun
 
Conservation Biotechnology: DNA and Tissue Bank, DNA Barcoding , DNA fingerpr...
Conservation Biotechnology: DNA and Tissue Bank, DNA Barcoding, DNA fingerpr...Conservation Biotechnology: DNA and Tissue Bank, DNA Barcoding, DNA fingerpr...
Conservation Biotechnology: DNA and Tissue Bank, DNA Barcoding , DNA fingerpr...AnitaPoudel5
 
Exploiting NLP for Digital Disease Informatics
Exploiting NLP for Digital Disease InformaticsExploiting NLP for Digital Disease Informatics
Exploiting NLP for Digital Disease InformaticsNigel Collier
 
2009 12 07 - LOINC Introduction and Overview
2009 12 07 - LOINC Introduction and Overview2009 12 07 - LOINC Introduction and Overview
2009 12 07 - LOINC Introduction and Overviewdvreeman
 
Neuron Bio D. Juan M. Alfaro - Commercial Manager
Neuron Bio D. Juan M. Alfaro - Commercial ManagerNeuron Bio D. Juan M. Alfaro - Commercial Manager
Neuron Bio D. Juan M. Alfaro - Commercial ManagerFIBAO
 
Vectors, environment and society unit
Vectors, environment and society unitVectors, environment and society unit
Vectors, environment and society unitvaléry ridde
 
Personalized Oral Medicine
Personalized Oral MedicinePersonalized Oral Medicine
Personalized Oral MedicineHarold Slavkin
 
2012 03 20 - LOINC Introduction - AMIA KRS-WG
2012 03 20 - LOINC Introduction - AMIA KRS-WG2012 03 20 - LOINC Introduction - AMIA KRS-WG
2012 03 20 - LOINC Introduction - AMIA KRS-WGdvreeman
 
Rapid Impact Assessment of Climatic and Physio-graphic Changes on Flagship G...
Rapid Impact Assessment of Climatic and Physio-graphic Changes  on Flagship G...Rapid Impact Assessment of Climatic and Physio-graphic Changes  on Flagship G...
Rapid Impact Assessment of Climatic and Physio-graphic Changes on Flagship G...Arvinder Singh
 
2010 12 06 - LOINC Introduction
2010 12 06 - LOINC Introduction2010 12 06 - LOINC Introduction
2010 12 06 - LOINC Introductiondvreeman
 
Country Status Reports on Agricultural Biotechnology - Bangladesh
Country Status Reports on Agricultural Biotechnology - BangladeshCountry Status Reports on Agricultural Biotechnology - Bangladesh
Country Status Reports on Agricultural Biotechnology - Bangladeshapaari
 
PNCTI Valoración de la Biodiversidad CONCYTEC 2014 COP 20 Mmatchmaking
PNCTI Valoración de la Biodiversidad CONCYTEC 2014 COP 20 MmatchmakingPNCTI Valoración de la Biodiversidad CONCYTEC 2014 COP 20 Mmatchmaking
PNCTI Valoración de la Biodiversidad CONCYTEC 2014 COP 20 MmatchmakingRIICCHPeru
 
Knowledge curation for COVID-19
Knowledge curation for COVID-19Knowledge curation for COVID-19
Knowledge curation for COVID-19Sonja Aits
 
State of the WHO Family of International Classifications -2015
State of the WHO Family of International Classifications -2015State of the WHO Family of International Classifications -2015
State of the WHO Family of International Classifications -2015Bedirhan Ustun
 
Fish biodiversity and food supply: Species numbers in the wild and exploited;...
Fish biodiversity and food supply: Species numbers in the wild and exploited;...Fish biodiversity and food supply: Species numbers in the wild and exploited;...
Fish biodiversity and food supply: Species numbers in the wild and exploited;...WorldFish
 
Indo norway delhi_vishwas_28_oct2011_final
Indo norway delhi_vishwas_28_oct2011_finalIndo norway delhi_vishwas_28_oct2011_final
Indo norway delhi_vishwas_28_oct2011_finalVishwas Chavan
 

Similar to Mention detection, normalization & classification of species, pathogens, humans and food in clinical documents: Overview of the LivingNER shared task and resources (talk at IberLEF @ SEPLN 2022) (20)

R Obomsawin CV
R Obomsawin CVR Obomsawin CV
R Obomsawin CV
 
SympTEMIST Shared Task on Symptoms, Signs and Findings Detection and Normaliz...
SympTEMIST Shared Task on Symptoms, Signs and Findings Detection and Normaliz...SympTEMIST Shared Task on Symptoms, Signs and Findings Detection and Normaliz...
SympTEMIST Shared Task on Symptoms, Signs and Findings Detection and Normaliz...
 
Mansfield CV 2016 LinkedIN
Mansfield CV 2016 LinkedINMansfield CV 2016 LinkedIN
Mansfield CV 2016 LinkedIN
 
dkNET Poster Experimental Biology 2019
dkNET Poster Experimental Biology 2019dkNET Poster Experimental Biology 2019
dkNET Poster Experimental Biology 2019
 
Utility and Added Value of Classifications in Health Information Systems
Utility and Added Value of Classifications in Health Information SystemsUtility and Added Value of Classifications in Health Information Systems
Utility and Added Value of Classifications in Health Information Systems
 
Conservation Biotechnology: DNA and Tissue Bank, DNA Barcoding , DNA fingerpr...
Conservation Biotechnology: DNA and Tissue Bank, DNA Barcoding, DNA fingerpr...Conservation Biotechnology: DNA and Tissue Bank, DNA Barcoding, DNA fingerpr...
Conservation Biotechnology: DNA and Tissue Bank, DNA Barcoding , DNA fingerpr...
 
Exploiting NLP for Digital Disease Informatics
Exploiting NLP for Digital Disease InformaticsExploiting NLP for Digital Disease Informatics
Exploiting NLP for Digital Disease Informatics
 
2009 12 07 - LOINC Introduction and Overview
2009 12 07 - LOINC Introduction and Overview2009 12 07 - LOINC Introduction and Overview
2009 12 07 - LOINC Introduction and Overview
 
Neuron Bio D. Juan M. Alfaro - Commercial Manager
Neuron Bio D. Juan M. Alfaro - Commercial ManagerNeuron Bio D. Juan M. Alfaro - Commercial Manager
Neuron Bio D. Juan M. Alfaro - Commercial Manager
 
Vectors, environment and society unit
Vectors, environment and society unitVectors, environment and society unit
Vectors, environment and society unit
 
Personalized Oral Medicine
Personalized Oral MedicinePersonalized Oral Medicine
Personalized Oral Medicine
 
2012 03 20 - LOINC Introduction - AMIA KRS-WG
2012 03 20 - LOINC Introduction - AMIA KRS-WG2012 03 20 - LOINC Introduction - AMIA KRS-WG
2012 03 20 - LOINC Introduction - AMIA KRS-WG
 
Rapid Impact Assessment of Climatic and Physio-graphic Changes on Flagship G...
Rapid Impact Assessment of Climatic and Physio-graphic Changes  on Flagship G...Rapid Impact Assessment of Climatic and Physio-graphic Changes  on Flagship G...
Rapid Impact Assessment of Climatic and Physio-graphic Changes on Flagship G...
 
2010 12 06 - LOINC Introduction
2010 12 06 - LOINC Introduction2010 12 06 - LOINC Introduction
2010 12 06 - LOINC Introduction
 
Country Status Reports on Agricultural Biotechnology - Bangladesh
Country Status Reports on Agricultural Biotechnology - BangladeshCountry Status Reports on Agricultural Biotechnology - Bangladesh
Country Status Reports on Agricultural Biotechnology - Bangladesh
 
PNCTI Valoración de la Biodiversidad CONCYTEC 2014 COP 20 Mmatchmaking
PNCTI Valoración de la Biodiversidad CONCYTEC 2014 COP 20 MmatchmakingPNCTI Valoración de la Biodiversidad CONCYTEC 2014 COP 20 Mmatchmaking
PNCTI Valoración de la Biodiversidad CONCYTEC 2014 COP 20 Mmatchmaking
 
Knowledge curation for COVID-19
Knowledge curation for COVID-19Knowledge curation for COVID-19
Knowledge curation for COVID-19
 
State of the WHO Family of International Classifications -2015
State of the WHO Family of International Classifications -2015State of the WHO Family of International Classifications -2015
State of the WHO Family of International Classifications -2015
 
Fish biodiversity and food supply: Species numbers in the wild and exploited;...
Fish biodiversity and food supply: Species numbers in the wild and exploited;...Fish biodiversity and food supply: Species numbers in the wild and exploited;...
Fish biodiversity and food supply: Species numbers in the wild and exploited;...
 
Indo norway delhi_vishwas_28_oct2011_final
Indo norway delhi_vishwas_28_oct2011_finalIndo norway delhi_vishwas_28_oct2011_final
Indo norway delhi_vishwas_28_oct2011_final
 

Recently uploaded

Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 

Recently uploaded (20)

Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 

Mention detection, normalization & classification of species, pathogens, humans and food in clinical documents: Overview of the LivingNER shared task and resources (talk at IberLEF @ SEPLN 2022)

  • 1. Barcelona Supercomputing Center (BSC): • Antonio Miranda-Escalada • Luis Gascó • Salvador Lima-López • Eulàlia Farré-Maduell • Darryl Estrada • Martin Krallinger Mention detection, normalization & classification of species, pathogens, humans and food in clinical documents: Overview of the LivingNER shared task and resources Martin Krallinger Head of Text Mining Unit, BSC <mkrallin@bsc.es> IberLEF @ SEPLN 2022 LivingNER corpus: doi.org/10.5281/zenodo.6376662 1
  • 2. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com Importance of species information extraction De − Allice Hunter - File:Hispanophone global world map language.png,CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=69323596 National Center for Biotechnology Information (NCBI) Taxonomy
  • 3. How many species inhabit the earth How many species do we know Quantification of global species richness Taxonomic classification of species Number of species in a taxonomic group Validation against well-known taxa 250 years of taxonomic classification 1.2 million species catalogued in a central database 86% of species on Earth and 91% of species in the ocean still await description Knowledge gap
  • 4. -Large collection of species, change over time, hierarchical relation types relation types -Homonymy with commonly used words, e.g.: “Spot” (Leiostomus xanthurus) and “Permit” (Trachinotus falcatus) -Homonymy with other medical entities (the word “goat” can refer to proteins found in human, zebrafish, rat and mouse. -Abbreviations are ambiguous, e.g.: HBV can be used for both “Hepatitis B virus” as well as “Hepatitis B vaccine” -Vernacular form (common names) - Incorrect case or misspelt (like, Bacterium coli, Bacillus coli and Escheria coli for Escherichia coli) - Coordinations, nested expressions: “human immunodeficiency viruses types 1 and 2”, refer to two distinct species names, “HIV type 1” and “HIV type 2” - Role names (e.g. athletes, responders) - Human mencions in the form of family members, etc…. Challenges
  • 5. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com Previous SPECIES extraction and normalization efforts ● LivingNER < 2000 2000-2010 2010-2021 2022 ● The Catalogue of Life [Index of the world's species] [Bánki et al., 2022] [2001] ●Infectious Diseases (ID) task of BioNLP [Corpus and shared task] [Pyysalo et al., 2011] [2011] ● SPECIES [Species mention and normalisation to NCBI taxonomy corpus and tool] [Pafilis et al., 2013] [2014] ● ITIS (Integrated Taxonomic Information System) [Federal effort to provide consistent biological taxonomies] [1996] ● NCBI taxonomy [Terminological resource] [Federhen, 2012] [1997] ● Global Names Architecture database [organizes and cross-links electronic information about organisms] [Pyle et al., 2016] [2016] ● LINNAEUS [Species mention and normalisation to NCBI taxonomy corpus and tool] [Gerner et al., 2010] [2010]
  • 6. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com LivingNER overview
  • 7. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com LivingNER resources LivingNER corpus: doi.org/10.5281/zenodo.6376662 LivingNER annotation guidelines: doi.org/10.5281/zenodo.6385162 LivingNER Multilingual Silver Standard: doi.org/10.5281/zenodo.6376662 LivingNER terminology: doi.org/10.5281/zenodo.6390506 LivingNER Silver Standard: LivingNER evaluation library: github.com/tonifuc3m/livingner-evaluation-library LivingNER participant systems: temu.bsc.es/livingner/participant-systems/ LivingNER YouTube playlist: https://www.youtube.com/channel/UCDsmS1pCCO8TW312wJq8aCQ/playlists
  • 8. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com LivingNER Corpus: documents, format and annotation
  • 9. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com LivingNER Corpus - Overview ● Diversity: Atención primaria, dermatología, medicina interna, medicina tropical, endocrinología, neurología, oftalmología, psiquiatría, radiología, urgencias, cardiología, pediatrita, oncología, odontología,.. ● Manual entity annotations, NCBI taxonomy mapping and application classification ● Inter-Annotator Agreement (IAA): 94.2 ● Random training, validation and test split Most common SPECIES mentions
  • 10. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com DisTEMIST Multilingual Silver Standard
  • 11. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com DisTEMIST Multilingual Silver Standard Spanish Gold Standard English Silver Standard Online visualiser: https://temu.bsc.es/mLivingNER/diff.xhtml#/translations/en/annotation_transfer/train/caso_clinico_radiologia942?dif f=/gold-standard/train/ NCBI Tax ID: 11103 NCBI Tax ID: 11103 NCBI Tax ID: 1311 NCBI Tax ID: 1311
  • 12. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com LivingNER participating teams ● Registrations: 56 ● SPECIES NER track: 20 participating teams, 41 submissions ● SPECIES Norm track: 8 teams, 14 submissions ● Clinical Impact track: 5 teams, 6 submissions
  • 13. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com LivingNER participant results ● MiF: micro-averaged F-score (main metric) ● MiP: micro-avg. Precision ● MiR: micro-avg. Recall github.com/tonifuc3m/livingner-evaluation-library SPECIES NER SPECIES Norm
  • 14. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com LivingNER participant results ● MiF: micro-averaged F-score (main metric) ● MiP: micro-avg. Precision ● MiR: micro-avg. Recall github.com/tonifuc3m/livingner-evaluation-library SPECIES NER SPECIES Norm
  • 15. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com LivingNER participant results - Clinical Impact track
  • 16. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com • Increasing interest in Spanish clinical NLP tasks • LivingNER Resources ○ LivingNER Corpus: Species entity Gold Standard corpus mapped to NCBI Taxonomy. ○ LivingNER Multilingual Silver Standard Corpus: Disease entity corpora normalised to NCBI Taxonomy in several languages. ○ LivingNER Spanish Silver Standard (from participants’ predictions) Conclusions
  • 17. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com • Correct the LivingNER Multilingual Silver Standard to generate a Gold Standard subset of each language to create high-quality benchmarks in the seven languages. • Clinical Impact track lacked enough training and test data, and we plan to correct this issue in the future. Future directions ● Generate more granular annotations for the HUMAN mentions that are needed for real-world applications. Actual examples of annotated species mentions and automatically recognized profession mentions.
  • 18. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com Acknowledgements LivingNER Participants & LivingNER Scientific Committee IberLEF organisers ● Manuel ● Julio ● and all others SEPLN organisers Funding: • Plan de Tecnologías del Lenguaje • AI4PROFHEALTH (PID2020-119266RA-I00) • BioMATDB Horizon Europe Grant Agreement No 101058779 BSC Text Mining Unit
  • 19. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com LivingNER resources LivingNER corpus: doi.org/10.5281/zenodo.6376662 LivingNER annotation guidelines: doi.org/10.5281/zenodo.6385162 LivingNER Multilingual Silver Standard: doi.org/10.5281/zenodo.6376662 LivingNER terminology: doi.org/10.5281/zenodo.6390506 LivingNER Silver Standard: LivingNER evaluation library: github.com/tonifuc3m/livingner-evaluation-library LivingNER participant systems: temu.bsc.es/livingner/participant-systems/ LivingNER YouTube playlist: https://youtube.com/playlist?list=PL5uSCzf1azhA_gMLC3DBZe6NvmMJiggTg
  • 20. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com Questions?
  • 21. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com