SlideShare a Scribd company logo
Aisha Kalsoom
Tools and techniques to help researchers cope with the information overload
are therefore needed.
NER tools can be applied to find all kind of entities, such as gene or protein
names, diseases and drugs, mutations or properties of protein structures.
Medline database contained approx. 15 million scientific abstracts with a
growth rate of about 400,000 articles per year.
Identification of proteins or genes is important to find out protein
interaction networks.
Concepts, meaning
and representation
Names in text
represent real-life
concepts in our mind
Concept denoted by a
gene name is usually
not clearly defined
No community-wide
agreement to name
particular gene
Supermarket
Sonic
Hedgehog gene
in human
p53
2WRU
• Clone during mapping phase in Human GENOME Project had
up to 15 different names
• FLT4 has four names: PCL; FLT41; LMPH1A;VEGFR3
Many genes and
proteins have more
than one name
• Cbp/p300- interactive transactivator
• CCAAT/enhancer binding protein, C/EBP alpha
Inconsistent use of
variations of names
• BioCreative Corpus of expert tagged gene names consist of
53% of all names consist of more than one token
• HumanT-cell leukaemia lymphotropic virus type 1Tax protein
Multi-word names
Acronyms are
homonyms
• SEC stands for
• surface epithelial cell
• size exclusion chromatography
• Selenocystein
 Lesar, U. and Hakenberg, J. (2005), ‘What makes a gene name? Named entity
recognition in the biomedical literature’, Briefings in Bioinformatics,Vol. 6(4), pp.
357-369.
 http://www.bioinformatics.org/textknowledge/acronym.php?textfield=SEC&sub
=search
 http://www.rcsb.org/pdb/explore/explore.do?structureId=2WRU

More Related Content

What's hot

Chun kit SigmaXI 2021 presentation
Chun kit SigmaXI 2021 presentationChun kit SigmaXI 2021 presentation
Chun kit SigmaXI 2021 presentation
ChunKitShum
 
Long-term balancing selection in a white-rot fungi
Long-term balancing selection in a white-rotfungiLong-term balancing selection in a white-rotfungi
Long-term balancing selection in a white-rot fungi
David Peris Navarro
 
Scope and goals of Human genome project,M. Sc. Zoology, University of Mumbai
Scope and goals of Human genome project,M. Sc. Zoology, University of MumbaiScope and goals of Human genome project,M. Sc. Zoology, University of Mumbai
Scope and goals of Human genome project,M. Sc. Zoology, University of Mumbai
Royston Rogers
 
Horizontal gene transfer
Horizontal gene transferHorizontal gene transfer
Horizontal gene transfer
moincamacaro
 

What's hot (20)

Introduction to the concept of genomics april quiapo ho maed2_s
Introduction to the concept of genomics april quiapo ho maed2_sIntroduction to the concept of genomics april quiapo ho maed2_s
Introduction to the concept of genomics april quiapo ho maed2_s
 
Gene
GeneGene
Gene
 
Omim
 Omim Omim
Omim
 
DNA PROFILING
DNA PROFILINGDNA PROFILING
DNA PROFILING
 
Cell authentication by str profile
Cell authentication by str profileCell authentication by str profile
Cell authentication by str profile
 
Ranavirus in Costa Rica
Ranavirus in Costa RicaRanavirus in Costa Rica
Ranavirus in Costa Rica
 
Plegable Biología Molecular
Plegable Biología MolecularPlegable Biología Molecular
Plegable Biología Molecular
 
Predicting risk via analysis of Phytophthora genome evolution
Predicting risk via analysis of Phytophthora genome evolutionPredicting risk via analysis of Phytophthora genome evolution
Predicting risk via analysis of Phytophthora genome evolution
 
Chun kit SigmaXI 2021 presentation
Chun kit SigmaXI 2021 presentationChun kit SigmaXI 2021 presentation
Chun kit SigmaXI 2021 presentation
 
Genome Sequencing
Genome SequencingGenome Sequencing
Genome Sequencing
 
Long-term balancing selection in a white-rot fungi
Long-term balancing selection in a white-rotfungiLong-term balancing selection in a white-rotfungi
Long-term balancing selection in a white-rot fungi
 
Unknown knowns, long tails, and long data
Unknown knowns, long tails, and long dataUnknown knowns, long tails, and long data
Unknown knowns, long tails, and long data
 
Mapping Biodiversity - The Atlas of Living Australia
Mapping Biodiversity - The Atlas of Living AustraliaMapping Biodiversity - The Atlas of Living Australia
Mapping Biodiversity - The Atlas of Living Australia
 
Cell Authentication By STR Profiling
Cell Authentication By STR ProfilingCell Authentication By STR Profiling
Cell Authentication By STR Profiling
 
Scope and goals of Human genome project,M. Sc. Zoology, University of Mumbai
Scope and goals of Human genome project,M. Sc. Zoology, University of MumbaiScope and goals of Human genome project,M. Sc. Zoology, University of Mumbai
Scope and goals of Human genome project,M. Sc. Zoology, University of Mumbai
 
In praise of grumpy old men: Open versus closed data and the challenge of cre...
In praise of grumpy old men: Open versus closed data and the challenge of cre...In praise of grumpy old men: Open versus closed data and the challenge of cre...
In praise of grumpy old men: Open versus closed data and the challenge of cre...
 
An introduction to PATRIC and its use in phage annotation
An introduction to PATRIC and its use in phage annotationAn introduction to PATRIC and its use in phage annotation
An introduction to PATRIC and its use in phage annotation
 
Horizontal gene transfer
Horizontal gene transferHorizontal gene transfer
Horizontal gene transfer
 
Ewan mollison wp4 13 Nov 19
Ewan mollison wp4 13 Nov 19Ewan mollison wp4 13 Nov 19
Ewan mollison wp4 13 Nov 19
 
antibody engineering and xenotransplantation
antibody engineering and xenotransplantation antibody engineering and xenotransplantation
antibody engineering and xenotransplantation
 

Similar to Name Entity Recognition problems in biomedical literature

Visualization of insect vector-plant pathogen interactions in the citrus gree...
Visualization of insect vector-plant pathogen interactions in the citrus gree...Visualization of insect vector-plant pathogen interactions in the citrus gree...
Visualization of insect vector-plant pathogen interactions in the citrus gree...
Surya Saha
 
Current Trends in Molecular Biology and BioTechnology (ppt)
Current Trends in Molecular Biology and BioTechnology (ppt)Current Trends in Molecular Biology and BioTechnology (ppt)
Current Trends in Molecular Biology and BioTechnology (ppt)
Perez Eric
 
Hinton et al 2014 Stem Cells
Hinton et al 2014 Stem CellsHinton et al 2014 Stem Cells
Hinton et al 2014 Stem Cells
Andrew Hinton
 
TaskDifferentiate the following terms and provide an image obtain.docx
TaskDifferentiate the following terms and provide an image obtain.docxTaskDifferentiate the following terms and provide an image obtain.docx
TaskDifferentiate the following terms and provide an image obtain.docx
josies1
 
Dna profiling presentation x2
Dna profiling presentation x2Dna profiling presentation x2
Dna profiling presentation x2
Eli Rosenthal
 
Many human accelerated regions are developmental enhancers
Many human accelerated regions are developmental enhancersMany human accelerated regions are developmental enhancers
Many human accelerated regions are developmental enhancers
José Luis Moreno Garvayo
 
Antiviral Hammerhead ribozymes are ef
Antiviral Hammerhead ribozymes are efAntiviral Hammerhead ribozymes are ef
Antiviral Hammerhead ribozymes are ef
Siddhesh Sapre
 

Similar to Name Entity Recognition problems in biomedical literature (20)

Rice stress related gene expression analysis
Rice stress related gene expression analysisRice stress related gene expression analysis
Rice stress related gene expression analysis
 
Visualization of insect vector-plant pathogen interactions in the citrus gree...
Visualization of insect vector-plant pathogen interactions in the citrus gree...Visualization of insect vector-plant pathogen interactions in the citrus gree...
Visualization of insect vector-plant pathogen interactions in the citrus gree...
 
Current Trends in Molecular Biology and BioTechnology (ppt)
Current Trends in Molecular Biology and BioTechnology (ppt)Current Trends in Molecular Biology and BioTechnology (ppt)
Current Trends in Molecular Biology and BioTechnology (ppt)
 
DNA Technology
DNA TechnologyDNA Technology
DNA Technology
 
Hinton et al 2014 Stem Cells
Hinton et al 2014 Stem CellsHinton et al 2014 Stem Cells
Hinton et al 2014 Stem Cells
 
TLSC Biotech 101 Noc 2010 (Moore)
TLSC Biotech 101 Noc 2010 (Moore)TLSC Biotech 101 Noc 2010 (Moore)
TLSC Biotech 101 Noc 2010 (Moore)
 
Human genome project
Human genome projectHuman genome project
Human genome project
 
iPSC-derived Cardiomyocytes
iPSC-derived CardiomyocytesiPSC-derived Cardiomyocytes
iPSC-derived Cardiomyocytes
 
Spotlight on ipsc reprogramming
Spotlight on ipsc reprogrammingSpotlight on ipsc reprogramming
Spotlight on ipsc reprogramming
 
Spotlight on i psc reprogramming
Spotlight on i psc reprogrammingSpotlight on i psc reprogramming
Spotlight on i psc reprogramming
 
A_Brief_History_of_PCR.ppt
A_Brief_History_of_PCR.pptA_Brief_History_of_PCR.ppt
A_Brief_History_of_PCR.ppt
 
A brief History of PCR
A brief History of PCRA brief History of PCR
A brief History of PCR
 
TaskDifferentiate the following terms and provide an image obtain.docx
TaskDifferentiate the following terms and provide an image obtain.docxTaskDifferentiate the following terms and provide an image obtain.docx
TaskDifferentiate the following terms and provide an image obtain.docx
 
Dna profiling presentation x2
Dna profiling presentation x2Dna profiling presentation x2
Dna profiling presentation x2
 
Dna profiling presentation x2
Dna profiling presentation x2Dna profiling presentation x2
Dna profiling presentation x2
 
Human genome project - Decoding the codes of life
Human genome project - Decoding the codes of lifeHuman genome project - Decoding the codes of life
Human genome project - Decoding the codes of life
 
Many human accelerated regions are developmental enhancers
Many human accelerated regions are developmental enhancersMany human accelerated regions are developmental enhancers
Many human accelerated regions are developmental enhancers
 
Exploring your personal genome with free, online bioinformatics tools
Exploring your personal genome with free, online bioinformatics toolsExploring your personal genome with free, online bioinformatics tools
Exploring your personal genome with free, online bioinformatics tools
 
Bioinformatics issues and challanges presentation at s p college
Bioinformatics  issues and challanges  presentation at s p collegeBioinformatics  issues and challanges  presentation at s p college
Bioinformatics issues and challanges presentation at s p college
 
Antiviral Hammerhead ribozymes are ef
Antiviral Hammerhead ribozymes are efAntiviral Hammerhead ribozymes are ef
Antiviral Hammerhead ribozymes are ef
 

More from Aisha Kalsoom

Medical implication of developmental biology
Medical implication of developmental biologyMedical implication of developmental biology
Medical implication of developmental biology
Aisha Kalsoom
 
ADA programming language
ADA programming languageADA programming language
ADA programming language
Aisha Kalsoom
 
Ugene Bioinformatics software
Ugene Bioinformatics softwareUgene Bioinformatics software
Ugene Bioinformatics software
Aisha Kalsoom
 
Bioinformatics and functional genomics
Bioinformatics and functional genomicsBioinformatics and functional genomics
Bioinformatics and functional genomics
Aisha Kalsoom
 
Post-Translational Modifications
Post-Translational ModificationsPost-Translational Modifications
Post-Translational Modifications
Aisha Kalsoom
 
Polymerase chain reaction
Polymerase chain reactionPolymerase chain reaction
Polymerase chain reaction
Aisha Kalsoom
 

More from Aisha Kalsoom (12)

Neural Network Based Brain Tumor Detection using MR Images
Neural Network Based Brain Tumor Detection using MR ImagesNeural Network Based Brain Tumor Detection using MR Images
Neural Network Based Brain Tumor Detection using MR Images
 
An Efficient Decentralized Load Balancing Algorithm in Cloud Computing
An Efficient Decentralized Load Balancing Algorithm in Cloud ComputingAn Efficient Decentralized Load Balancing Algorithm in Cloud Computing
An Efficient Decentralized Load Balancing Algorithm in Cloud Computing
 
Insilico comparative analysis of critical residues of CSN gene in 41 mammals:...
Insilico comparative analysis of critical residues of CSN gene in 41 mammals:...Insilico comparative analysis of critical residues of CSN gene in 41 mammals:...
Insilico comparative analysis of critical residues of CSN gene in 41 mammals:...
 
Protein
ProteinProtein
Protein
 
Medical implication of developmental biology
Medical implication of developmental biologyMedical implication of developmental biology
Medical implication of developmental biology
 
ADA programming language
ADA programming languageADA programming language
ADA programming language
 
Ugene Bioinformatics software
Ugene Bioinformatics softwareUgene Bioinformatics software
Ugene Bioinformatics software
 
Bioinformatics and functional genomics
Bioinformatics and functional genomicsBioinformatics and functional genomics
Bioinformatics and functional genomics
 
Post-Translational Modifications
Post-Translational ModificationsPost-Translational Modifications
Post-Translational Modifications
 
Employee Motivation
Employee MotivationEmployee Motivation
Employee Motivation
 
Polymerase chain reaction
Polymerase chain reactionPolymerase chain reaction
Polymerase chain reaction
 
Psychology
PsychologyPsychology
Psychology
 

Recently uploaded

The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
kaushalkr1407
 
678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf
CarlosHernanMontoyab2
 
Accounting and finance exit exam 2016 E.C.pdf
Accounting and finance exit exam 2016 E.C.pdfAccounting and finance exit exam 2016 E.C.pdf
Accounting and finance exit exam 2016 E.C.pdf
YibeltalNibretu
 
Industrial Training Report- AKTU Industrial Training Report
Industrial Training Report- AKTU Industrial Training ReportIndustrial Training Report- AKTU Industrial Training Report
Industrial Training Report- AKTU Industrial Training Report
Avinash Rai
 

Recently uploaded (20)

The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
 
Basic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.pptBasic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.ppt
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
 
PART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePART A. Introduction to Costumer Service
PART A. Introduction to Costumer Service
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
 
Accounting and finance exit exam 2016 E.C.pdf
Accounting and finance exit exam 2016 E.C.pdfAccounting and finance exit exam 2016 E.C.pdf
Accounting and finance exit exam 2016 E.C.pdf
 
Solid waste management & Types of Basic civil Engineering notes by DJ Sir.pptx
Solid waste management & Types of Basic civil Engineering notes by DJ Sir.pptxSolid waste management & Types of Basic civil Engineering notes by DJ Sir.pptx
Solid waste management & Types of Basic civil Engineering notes by DJ Sir.pptx
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...
UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...
UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
Industrial Training Report- AKTU Industrial Training Report
Industrial Training Report- AKTU Industrial Training ReportIndustrial Training Report- AKTU Industrial Training Report
Industrial Training Report- AKTU Industrial Training Report
 

Name Entity Recognition problems in biomedical literature

  • 2. Tools and techniques to help researchers cope with the information overload are therefore needed. NER tools can be applied to find all kind of entities, such as gene or protein names, diseases and drugs, mutations or properties of protein structures. Medline database contained approx. 15 million scientific abstracts with a growth rate of about 400,000 articles per year. Identification of proteins or genes is important to find out protein interaction networks.
  • 3. Concepts, meaning and representation Names in text represent real-life concepts in our mind Concept denoted by a gene name is usually not clearly defined No community-wide agreement to name particular gene Supermarket Sonic Hedgehog gene in human p53 2WRU
  • 4. • Clone during mapping phase in Human GENOME Project had up to 15 different names • FLT4 has four names: PCL; FLT41; LMPH1A;VEGFR3 Many genes and proteins have more than one name • Cbp/p300- interactive transactivator • CCAAT/enhancer binding protein, C/EBP alpha Inconsistent use of variations of names • BioCreative Corpus of expert tagged gene names consist of 53% of all names consist of more than one token • HumanT-cell leukaemia lymphotropic virus type 1Tax protein Multi-word names Acronyms are homonyms • SEC stands for • surface epithelial cell • size exclusion chromatography • Selenocystein
  • 5.  Lesar, U. and Hakenberg, J. (2005), ‘What makes a gene name? Named entity recognition in the biomedical literature’, Briefings in Bioinformatics,Vol. 6(4), pp. 357-369.  http://www.bioinformatics.org/textknowledge/acronym.php?textfield=SEC&sub =search  http://www.rcsb.org/pdb/explore/explore.do?structureId=2WRU